Tag Archives | Dear SQL DBA

Are Bad Statistics Making My Query Slow? (Dear SQL DBA Episode 39)

An important query is suddenly slow. Is it because statistics are out of date? This is tricky to figure out, and updating statistics right away can make troubleshooting even harder. Learn how to use query execution plans to get to the heart of the question and find out if stats are really your problem, or if it’s something else.

In this 35 minute episode:

  • 00:39 SQL Server 2017 Announced
  • 01:10 New video from Microsoft’s Joe Sack demonstrating Adaptive Query Processing
  • 03:05 This week’s question: Are bad stats making my query slow?
  • 05:26 Demo of finding plan in cache and analyzing stats begins
  • 28:17 What to do when stats ARE the problem

Code samples: https://gist.github.com/LitKnd/f07848d59cedc61fd057d12ab966f703

Audio-only version (downloadable)

Video version

Related links

SQL Server 2017 Adaptive Query Processing video by Joe Sack

Michael J Swart on finding Dark Matter Queries

Slow in the Application, Fast in SSMS? An SQL text by Erland Sommarskog

Got a question for Dear SQL DBA? Ask!

Will the Cloud Eat My DBA Job? (Dear SQL DBA Episode 31)

Will cloud services render DBAs obsolete? Does the cloud mean that developers will use less of SQL Server? In this post I talk about the future of database administration and give tips on strategizing your career.

Watch this week’s 28 minute episode, or scroll down to read a written version of the answer. Us YouTubers love it when you subscribe to our YouTube channels. You can also get this by podcast (and I would love it if you left a review on iTunes).

Dear SQL DBA…

I moved into the role of DBA about a year ago and still have a lot to learn, however I’m worried…

  • Cloud adoption may move more and more companies away from SQL Server (due to cost, for instance)
  • As the transition to the cloud becomes more commonplace the DBA job will become a service included in that offering

I responded with a few questions of my own:

  1. What made you decide to be a SQL Server DBA, specifically?
  2. What do you like about being a DBA? What parts of your job do you wish you could do all the time?
  3. Are you curious about cloud technologies at all? Or do they seem uninteresting to you?

We’re talking about career strategy here. It’s important to reflect a little on how you got to the point you’re at now, and what you enjoy doing and what you’re interested in in this context.

We’ll get back to how those questions were answered soon.

Lots of things have been reported to kill the DBA over the years

SQL Server 2005 was said to be “self-tuning”! Who needs a DBA when the instance tunes itself? (Apparently everyone.)

Outsourcing: All the DBA jobs are going to X location, then Y location, then Z location. Then back to X. DBA jobs have become more global, but “outsourcing” hasn’t gotten rid of DBA jobs in the United States. It has been part of the trend to make working remotely more normal and easy, which is generally good for DBAs.

DevOps! All the developers will manage everything. And somehow know to do so.  I love Dev Ops, and I have seen it wipe out some QA departments, but I haven’t seen it wipe out DBAs. I think it’s fun to be a DBA working with a Dev Ops team.

One job is going away… slowly

“Old School Databass Administrator”

Duties:

  • Installs SQL Server
  • Sets up and monitors jobs for backups, CHECKDB, and index maintenance
  • Manages user permissions
  • Stays in cube, silently

I don’t think the cloud specifically is killing off this role. IT teams are being demanded to communicate better, pay more attention to their customers, be more dynamic and specialized.

Who’s a DBA’s #1 customer?

Developers wanna develop

The cloud enables developers to use a lot of technologies.

But that doesn’t necessarily mean less SQL Server will be used.

The cloud is also bringing new ways for SQL Server to be cost effective – like elastic pools.

Developers have a massive challenge: there’s an incredible amount for them to learn.

Microsoft has been changing

It’s not just cloud — it’s things like SQL Server on Linux in vNext of SQL Server.

Yeah, you can spin up a Linux VM in the cloud and run SQL Server on it. But you can also do it on prem. Or in a container. You can do it anywhere. That’s really cool.

Microsoft is also embracing lots of technology in their cloud. You can bring data from multiple data sources together– including SQL Server.

Don’t get threatened by the fact that it’s not all SQL Server. If customers can do more and build more, and if there’s many ways to include SQL Server in that, it’s going to get used. It has tons of cool features for performance, and it’s a proven relational database with a really solid foundation.

Developers are still gonna need help

  • “This looks weird, what’s happening in the database?”
  • “Why is this slow?”
  • “What can we do to make queries faster without changing the code?”
  • “Is there a way to do this more cheaply?”

And also…

  • “Can we roll our own solution?”

The complexity developers face is only expanding. For those teams who choose SQL Server, there’s plenty of need for specialists.

Business owners will still need help

They have their own questions:

  • “What are the developers not seeing?”
  • “Are we meeting our SLAs, RPOs, RTOs and other acronyms?”
  • “Is our data secure?”

And also…

  • “Should we be in the cloud, or running our own?”

Remember how once upon a time, virtualization was going to completely take over everything? It definitely did become very common, but there were still plenty of companies who kept their SQL Servers using physical hardware for performance reasons. There are plenty of others who virtualized, then reverted to physical as well. And now there’s a trend toward the new hotness of containers.

As the options increase, the need for a SQL Server DBA who’s aware of the pros and cons of each just goes up.

DBA Specializations

Sysadmin / Platform

  • Scripting and managing large private implementations
  • High Availability and Disaster Recovery
  • Storage specializations

Performance tuning / architecture

  • Performance tuning critical workloads
  • Specialized code design patterns for high transaction workloads
  • Index tuning and new technologies

Cloud is a part of both of these. Many companies will use the cloud for failover. Many will come to use it primarily. Even then, which cloud do you use? Which way do you deploy to best suit your cost requirements and service level agreements? How do you manage it as it evolves?

Operations can be its own specialization

I used to work with two men who came from the “punch-card” days. When I met them they ran operations for a set of proprietary Hadoop-like clusters for a dot com. They were doing NOSQL before I ever heard it called NOSQL.

They were really good at operations. They adapted to new things to manage, and they did it very well.

That made me realize that as long as I’m good at solving a problem for a business with technology, I’m going to be able to find a job. I just need to keep adapting as the technology evolves.

Business intelligence is another example

The rise of self-serve data sounded like it was going to put BI folks out of business at the beginning.

After all, PowerPivot used “The Engine of the Devil!”

Who needs to build cubes anymore if people can just do everything they need in an Excel spreadsheet that references the cloud?

This customer wants their BI team more than ever

BI consultants now

I don’t work in Business Intelligence, but people seem to be really productive, happy, and active in that field.

More options on how to get things done resulted in more work, and more demand for interesting data to work with from business users. People want to figure out how to make their data meaningful in new ways.

What if you’re just starting out?

If you are…

  • Fresh in the industry, no experience
  • Someone who loves working with data

Should you focus on just Microsoft SQL Server?

Well, no, but that has nothing to do with SQL Server, or even the cloud specifically. Don’t settle down too fast. Why marry one technology before even getting to know others?

If you’re in this position, it’s an amazing time to experiment with all the technologies out there. This definitely includes cloud technology, but give yourself freedom to explore and try as many new things as you can get your hands on.

Our questioner isn’t in that position

Here’s how they answered my questions…

1.What made you decide to be a SQL Server DBA?

  • Accidental DBA, support background

2.What do you like about being a DBA?

  • Inner workings of SQL Server, flagship technologies. (NOT writing reports.)

3.Are you curious about cloud technologies?

  • Yes

The cloud is part of your job security

Get to know different options to deploy SQL Server to the cloud.

  • Strengths
  • Weaknesses / limitations

Ask questions around these areas…

  • Monitoring
  • HA/DR
  • How to support it / what maintenance is required

Cloud is a great reason to get to go to conferences! You need to provide the curiosity and justification for attending the training. Ask, and keep asking. When you do get to go to training, take the initiative to schedule time and share what you learned with your team. (It really helps when it comes time to ask to go again.)

Will the robots take my job?

Robots may take EVERYONE’s job. I can’t change that.

However, I think as database people, we’re probably near the end of the list of jobs that will be replaced by Skynet. The robots need us to serve them until they are fully self-sustaining, and no longer need humans at all.

But for the foreseeable future, there’s tons of work for data professionals to answer the question…

“How do we get the most out of all this stuff?”

My #1 advice: Look for problems you want to solve at work, and go after them

Don’t get stuck on a job title. I did this at one point in my career, and it was a huge mistake. The job title “DBA” may disappear, or you might find you like a job with a different title.

That’s fine. It’s not like “administrator” is a sexy term.

You get one life. You may as well get to solve problems that you enjoy while you work. That’s way more important than how the HR department tries to classify you.

Got a question for Dear SQL DBA?

Ask away at LittleKendra.com/dearsqldba!

What’s that Garbage in my Execution Plan? (Dear SQL DBA Episode 27)


Today I was working on some code samples for a user question, and I hit a weird roadblock.

There was a bunch of garbage in my execution plan that I couldn’t explain. And by ‘garbage’, I mean a nested loop to a whole branch of code that I hadn’t asked SQL Server to run — and a warning about an implicit conversion possibly causing problems with the quality of my execution plan.

It took me a while to figure out the issue, and along the way I asked the following questions:

Am I really querying a table, or am I accidentally querying a view? (It’s a table! I checked at least three times.)

Is there some weird computed column in the table that I don’t know about? (Nope, nothing involved was a computed column, I checked that twice.)

Am I awake right now? (Pinched myself, appeared awake. I have had weird dreams about SQL Server before, though.)

Do I actually know anything about SQL Server? (Just about everyone has imposter syndrome sometimes.)

Having gone through this checklist, I decided that I was awake, and that I could figure things out by looking through the execution plan carefully.

Sure enough, calming down and stepping through the plan did the trick. Just like it always has.

Watch the 18 minute video to find out the mysteries in this execution plan, or scroll on down to read about what I discovered. If you enjoy the video, you might like to subscribe to the podcast. A review on iTunes will help other folks find out about the show.

Here’s what the weirdness looked like

I was using the WideWorldImporters sample database from Microsoft. And I was running a stupid simple query.

Stupid simple like this:

SELECT 
    CustomerName
FROM Sales.Customers;
GO

I’m selecting one column. There are no predicates at all. Sales.Customers is a normal old rowstore table, and CustomerName is nvarchar(100).

For a query like this, I’d expect a very simple plan: an index seek or scan operator to pull back the data, and a SELECT operator.

Instead, I saw an execution plan with 20 nodes, and a big old warning on the SELECT operator.

Here’s what it looked like in SQL Sentry Plan Explorer:

Click to see a larger image

If you’d like to play along in a more interactive version, here’s the query over at Paste the Plan. (The website view doesn’t show al the operator properties I’m going to talk about here, but you can grab the XML and use it in SSMS or Plan Explorer to see the detail if you’d like.)

Hovering over the warning on the SELECT operator, here’s the warning (this is the tooltip from SSMS):

Weird, it’s warning about the SalesTerritory column. I didn’t ask for it to do anything at all with SalesTerritory. Why is it doing a type conversion on that column?

Let’s start at the top right of the query

When I’m in doubt about an execution plan, I like to just start at the top right of the query plan and work my way through. I think of that top right operator as the “driver”.

In this case, it’s a clustered index scan of Sales.Customers. That makes sense: I asked for all the customer names. Hovering over that operator, though, there is something funny. When I look at the ‘output’ columns, it is outputting not only CustomerName, but also DeliveryCityID!

So what’s it doing with DeliveryCityID?

Moving one step to the left in the plan, there’s a nested loop operator. Hovering over that operator, it says that it outputs the CustomerName column to the select operator. (Good, because that’s what we asked for!)

It also says that the Outer References for the nested loop are based on DeliveryCityID. OK, so it’s pulling back that column because it needs it to run the nested loop. We still don’t know why, but if we hunt around in that branch of the plan, maybe there’ll be a clue.

At this point, I started hovering over operators in that branch of the plan

As in life, when you’re lost in an execution plan, move around slowly and carefully, observe your surroundings, and look for your mom. I mean, look for inspiration.

I could see that the query was pulling from the Cities and StateProvinces tables. And there were a bunch of filter operators as well.

Click to see a larger image

Here’s what the filters are doing:

  • is_rolemember(N’db_owner’)<>(0)
  • is_rolemember([Expr1011]+N’ Sales’)<>(0)
  • [Expr1012]=session_context(N’SalesTerritory’)
  • original_login()=N’Website’

This is security garbage! Err… a security feature!

Aha! This is a definite clue. Some sort of security wizardry has been applied to this table, so that when I query it, a bunch of junk gets tacked onto my query.

I have no shame in admitting that I couldn’t remember at all what feature this was and how it works. A lot of security features were added in SQL Server 2016, and the whole point of a sample database like this to kick the tires of the features.

I did a little remembering, and a little searching, and figured out that this is the Row Level Security feature (RLS) in SQL Server 2016.

Row Level Security (RLS) adds predicates to your query execution plans

Here’s how row level security works. You create a table valued function that can be “inlined” (meaning merged) into your query execution plan. That function determines who can see the data.

Then you create a Security Policy for Row Level Security which defines when the table valued function will be applied to queries against a table.

The whole point of Row Level Security is, actually, that it adds these predicates to your execution plans.

How do I tell if Row Level Security changed my plan?

There’s a really easy way to tell if your plan was modified by RLS, I just didn’t know to look for it.

Click on the ‘SELECT’ operator in the plan and look down in the properties pane. If you see ‘SecurityPolicyApplied’ = True, then parts of your execution plan may have come from a table valued function that Row Level Security added in.

Should you use row level security?

Wellll…. maybe. If you’re interested, read up on the possible loopholes in RLS as it stands now, and consider if those would impact you or not. Aaron Bertrand has a great article to get you started: read SQL Server 2016 Row Level Security Limitations, Performance and Troubleshooting.

Everyone feels dumb looking at Execution Plans sometimes

I look at plans a lot, and still, they had me questioning my sanity today. When I first started doing performance tuning in SQL Server, I understood so little about plans that I gave up pretty easily.

I’m really happy that I kept going, though. Because as confusing as they are, most of the time the answers you’re looking for are right there in the plan. Somewhere.

Just keep swimming.

Limiting Downtime for Schema Changes (Dear SQL DBA Episode 25)

You need to release schema changes while the SQL Server is in use. Learn why code generation tools write odd scripts and how to stay sane amid rapid releases in this 28 minute video… or scroll down to read a summary.

If you enjoy the podcast, please write a review on iTunes!

Dear SQL DBA…

How can I achieve near zero downtime deployment with database schema changes? Can I use Peer-to-Peer Replication or some other feature to make this easier? It seems very complicated.

We use SQL Server Data Tools (SSDT) and the developers work agile and quite often make small changes to the schema. And we have found there are some snags with it, since SSDT generates the change script.

Here is an example of a common problem: when the developers add a column to a table you often get a Create/Insert/Drop Table create script by SSDT. Avoiding table locking when possible would help achieve near zero downtime deployment.

Peer to Peer Replication makes schema changes harder to manage, not easier

not-sure-if-uptime-or-downtimeI totally get why it seemed like P2P might help with this: it’s the only technology in SQL Server where you can have multiple “write” nodes for the same database.

Microsoft’s documentation on Peer to Peer replication says

Publications must allow schema changes to be replicated. (This is a setting of 1 for the publication property replicate_ddl, which is the default setting.)

I’m not sure if that’s a hard requirement (and it just won’t work if you disable replicating schema changes), or if your use of P2P is unsupported if you disable replicating schema changes— but you want to meet the requirement either way.

The requirement simply makes sense when it comes to data consistency. If I drop a column in a table on Node 1 at 2 pm, should the application be allowed to insert rows into the same table on Node 2 at 2:05 pm if I haven’t applied the schema change to Node 2? How would that those rows get replicated back to Node 1?  This would put the column into a Shroedinger’s cat situation: does the column exist, or not?

Other things to think about:

  • Sometimes code changes may break because they aren’t allowed by replication (such as changing a Primary Key)
  • When you have new tables, they have to be either manually added, or you need to manage the TSQL to add them (articles aren’t added automatically)
  • Adding or dropping articles requires quiescing, which is basically downtime for writes
  • Changes of a very large volume of data can make the replication get behind and the system harder to manage
  • You need to make sure that writes have been fully propagated to all nodes before moving the application to read from another node, otherwise writes they may have seen already could disappear
  • You have to manage conflicts if the same row is modified on more than one node
  • Peer to Peer replication is also not compatible with lots of new features at this time too (In-Memory tables, Availability Groups), which makes its future seem uncertain
  • Peer to Peer replication is Enterprise Edition only, so if you don’t already have the licensing for multiple nodes, it’s a very expensive gamble.

Peer to Peer replication wasn’t built to solve this particular problem. You can manage schema changes while using Peer to Peer, but it’s tough stuff.

Enterprise Edition has a cool feature when it comes to adding columns

For the example you mention of adding a column, some automatic code-writing tools do the “create a new table, insert, rename” operation because adding a column in place can sometimes be a very large, painful alter because it’s a “size of data operation.”

When you add a not-nullable column with a default value, SQL Server may have to go through and update every single row on the table to add that default value. That’s a really big in-place operation, and if you cancel it midway, you may be in for a nasty rollback on a large table. That’s why some folks gamble on just loading up a new table and switching it in, instead. In some cases, it’s faster!

I say adding a column may be a size of data operation because you have options:

  1. SQL Server Enterprise Edition has a feature where adding a new column with a default value avoids locking the table with a size of data operation.  That feature was added in SQL Server 2012. Remus Rusanu wrote a great article about it, Online non-NULL with values column add in SQL Server 2012.
    • This doesn’t work for all data types, and it only works for a constant (not something like NEWID where every row has a different value). Read Remus’ post for more details.
  2. You also have the choice of adding your new column as nullable, and writing code to manually work through the table and populate the rows in batches.

Super awkward – you might have that Enterprise feature everywhere except production

If you have Developer Edition in your development and staging environments, but Standard Edition in production, changes adding non-null columns with default values could run very quickly everywhere except production on your large tables. That’s because Developer Edition gets all the features of Enterprise Edition.

Note: I also talked about size of data operations in a previous Dear SQL DBA episode when I discussed another schema change: altering an INT column to a BIGINT.

Avoiding downtime with schema changes is mostly about process and experience

whoopsI’ve worked in a bunch of environments where we rolled schema changes to production throughout the week, and worked hard to limit downtime. Our biggest challenges were:

  • The code released wouldn’t have the intended effect, and multiple hotfixes would sometimes need to be written quickly if the issue was urgent.
  • Problems would happen with replication in production – either replication would be delayed because a lot of modifications were occurring, or a change in an “upstream” system would cause a huge amount of data processing to happen in a “downstream” database

But while there were bumps along the road, we got pretty good at it! It’s not easy, but the more you do it, the better you get at it.

Push schema changes before application changes

Adding a new feature? Tables and columns get added to the database well in advance of anything using them. This makes it easier if the schema changes need to be rolled back. Or if something goes wrong and the schema change doesn’t succeed, that whole release doesn’t have to be rolled back.

Test in a full scale pre-production environment

You need a staging environment to roll changes to before they hit production. Key points:

  • You need a full dataset. If production is a 2TB database, you need a 2TB database here.
  • This can’t be a dev environment. Dev environments are chaos, and have a bunch of pre-release code in them. This needs to match production.
  • The environment needs to be configured like production as well– if production is a cluster with replication, this environment needs to have a cluster with replication as well.

As long as you use this environment for testing and validating production changes, you can license this with Developer Edition — but note that “super awkard” limitation above where adding a column could be fast everywhere except production. And you still need to invest in storage and hardware. (It’s possible to go cheap on the hardware, but if you do so then you won’t get a reliable estimate of how long it’s going to take to deploy changes to production.)

If you’re just starting down the “deploy all the time and minimize downtime” road and you don’t have this, that’s OK.

When people start asking, “How can we make sure these mistakes don’t happen next time?”, this is your #1 suggestion.

Some code needs to be custom

For smaller changes, it may be OK to use automatically generated code to roll the change– but larger tables and sensitive databases can require custom coding to make sure you can do things like take advantage of that feature above, or custom code it.

The DBA usually finds out when there’s a need for custom code when they’re pushing a change to the pre-production (aka staging) environment, and it takes longer than expected.

It needs to be OK to screw up and have downtime

This was totally counter-intuitive to me. The whole point is that you’re trying to limit downtime.

But this is hard stuff. Things are going to break. Everyone needs to be able to learn when that happens, and being afraid doesn’t help people communicate better.

You need to do postmortems when you have downtime, and you need to have postmortems that don’t suck!

The hardest part of making this work is preventing the human tendency to blame from ruining the whole thing.

But we really can’t have downtime because of X, Y, Z

If it’s really not OK for schema changes to cause downtime, because people will be harmed, too much money will be lost, or some other reason, then either:

1. Releasing schema changes to SQL Server while production is active isn’t right for that application.

2. You need a highly customized architecture for each application designed to make this work and avoid downtime.

  • For some applications, you may be able cache the reads in the application tier, or a read only copy of the database, and queue the writes while you’re doing the schema changes.
  • Some applications may be able to be sharded into smaller databases, so data sizes are kept small. Schema changes on small tables are much simpler than large tables.

There may be other options as well, but it’s going to be a custom solution for each application.

Signs Your SQL Server is Running with Scissors (Dear SQL DBA Episode 24)

Does your team know what it’s doing with SQL Server? Learn what a consultant looks for when assessing a team, and signs that SQL Server may be badly configured.

Watch the 23 minute video, or scroll down to read a summary of this episode. If you enjoy the podcast, I’d appreciate your review on iTunes! Learn how to ask your own Dear SQL DBA question and subscribe.

Dear SQL DBA,

When sitting down for the first time with an existing SQL Server environment, what do you look for as an indication that they know what they’re doing? This is along of the lines of Van Halen and the brown M&Ms (http://www.npr.org/sections/therecord/2012/02/14/146880432/the-truth-about-van-halen-and-those-brown-m-ms).

Perceptive in Pittsburgh

What a fun question!

I’ve checked out new SQL Servers a few different ways. Sometimes I’ve looked at the SQL Server for the first time live, while talking to the customer. And sometimes I’ve had a bunch of information about the SQL Server’s configuration and performance metrics to review before I ever talk to anyone.

What I learned was not to judge a team by their SQL Server. Some configurations may look problematic, but make a lot more sense when I talk to the team and dig into problems they’re facing.

For instance, there’ve been many times when a team was facing a performance issue, and at first glance their SQL Server looked stupidly underprovisioned in terms of memory. Upon digging into the problem I found that adding more memory wouldn’t solve their particular problem. One size doesn’t usually fit all.

But there are definitely things that make me think a team has their act together. And there are specific configurations that make a DBA team look like they may need to have a little bit of an intervention.

How I identify a capable DBA team

There are two traits that make me think a team is capable of doing a good job managing a SQL Server, regardless of how much they know about the specifics of what SQL Server needs at the time:

  1. Taking ownership and responsibility for the configuration
  2. Being willing to discuss and challenge any element of the configuration

This doesn’t mean being willing to change any setting at the drop of the hat. It just means being open to learn and discuss changes, whatever they are, and evaluate the benefits, risks, and costs.

If you have these two traits, you can get your SQL Server into great shape and care for it over time. It doesn’t matter if you know nothing about SQL Server, because taking ownership means you’re willing to learn.

If you don’t have these traits, you’re going to struggle, regardless of what you know — because what works well will change, and you won’t.

A litmus test: is your team capable?

some-teamsDo you ever find yourself saying or thinking something like this?

There’s no way that we can do (some change), because I can’t get the (money / permission / other resource)

If you have that kind of mental block, you’re not really taking ownership and responsibility for the configuration.

This is a matter of perspective, but it’s a perspective that makes a huge difference. We all work with funding and resource constraints. If you see a change in configuration that you think would be beneficial, the perspective of the capable team is that you want to:

  • Identify scenarios where the change will make an improvement
  • Research and find resources to back up your idea
  • Design a way to measure the current state and potentially create a testing scenario to prove that it will work
  • Make a plan to make your case to business owners

It’s possible that the change won’t be approved. That’s OK — you file away your information, and if more incidents come up around the problem you’re trying to solve, you’re prepared with a possible solution.

Essentially, the capable team has the perspective, “What can we do to make this better?” They ask this question even if they can’t implement all the solutions.

If you and your team don’t have this perspective, this is something you can change. You don’t have to try to change others: just work to change your own perspective. If you can successfully change your approach and viewpoint, it will naturally spread to your team. (And as a big bonus, it’s likely to make you a happier employee.)

Five signs that your SQL Server is running with scissors

sql-underpantsAll this being said, there are some settings that make it look like you don’t know what you’re doing with a SQL Server.

1. Priority Boost is enabled

There’s been a warning on the Books Online entry for Priority Boost for years which says:

Raising the priority too high may drain resources from essential operating system and network functions, resulting in problems shutting down SQL Server or using other operating system tasks on the server.

It’s also deprecated. But it’s still enabled on plenty of SQL Servers.

Finding this is like seeing the sky turn green. You know that you’re in for a colorful time, because who knows what else is going on.

2. Processor affinity masking is configured

This is a sign that the SQL Server isn’t properly licensed. Some folks think that they can use processor affinity to reduce SQL Server’s licensing costs: they want to limit SQL Server to use only a specific number of cores, and pay for that exact number.

SQL Server licensing doesn’t work that way. You need to license all the cores that are provisioned to the Virtual Machine at the guest level, or license the physical cores installed in the server at the host level.

Finding this is really not fun, because it’s a very expensive piece of bad news to deliver.

3. Backup Log with TRUNCATE_ONLY / NO_LOG / DISK =’NUL:’

This one is becoming more rare because Microsoft removed the syntax for log backups with TRUNCATE_ONLY and NO_LOG in SQL Server 2008. That seems to have helped people understand that throwing away a log backup isn’t a good idea. If you don’t need point in time recovery and incremental restores, use the SIMPLE recovery model instead.

But some people still use the workaround which Microsoft left in place, where you can back up a log to NULL. This usually gets left in place if people were trying random fixes from the internet and didn’t understand what they were doing.

4. Auto-shrink is on when performance is important

It’s very well documented on the internet that shrinking can cause problems in SQL Server, and that auto-shrink is bad. It can cause blocking, random performance problems, and if you feed auto-shrink after midnight then your SQL Server will wreck the datacenter.

But I still sometimes find auto-shrink enabled.

The funny thing about this one is that usually people know they shouldn’t have it on, and are surprised by it. Finding this is usually a sign that the team is stuck in such a reactive fire-fighting mode that they haven’t scheduled any time to evaluate basic configuration errors.

5. Multiple instances in the same Windows install when performance is important

This is an example of finding old-school thinking that doesn’t make sense anymore. “Instance stacking” is when you install multiple SQL Server instances into the same Windows install. SQL Server lets you do it, and it’s supported to do so.

But when performance is important, this is a nightmare to manage. It’s like having a house full of hungry teenagers where any one of them can eat all the food in the fridge IN AN INSTANT, and then the other teenagers get really upset if they are awake.

If performance is important, the better approach is to either consolidate the instances (thereby simplifying troubleshooting by letting one SQL OS schedule resources across the whole server), or separate your teenagers into their own apartments using virtualization. If you have security reasons to keep ’em separated, virtualization gives you much more flexibility over time.

Finding this often means that the team used some really old information when they designed the environment, because instance-stacking was popular when Avril Lavigne first came out.

What do I do to take control of my SQL Server?

scissorsI don’t mean to shame anyone who realizes they’ve got a problem in these areas, or who isn’t sure.

It’s not a personal failing. It doesn’t mean you’re dumb. It means you have a prioritization problem. You need to make some time soon to step back and take charge of your configuration.

Talk to your manager about planning some time soon for you to perform a SQL Server health check. Look at free tools available for this online, and practice using them in a test environment. Then take a look at your production server.

Here are some free tools that may help you out — check them out, test em, and pick your favorite:

Don’t rush in to randomly changing settings. Document everything first, and then use a good change control process. If you don’t have change control in place, start with my free Change Control Template.

Why is My Query Faster the Second Time it Runs? (Dear SQL DBA Episode 23)

Today’s question is about why a query might be slow at first, then fast the next time you run it.

Watch the 26 minute video or scroll on down and read the written version of the episode instead. If you enjoy the podcast, I’d appreciate your review on iTunes! Info on how to ask questions and subscribe is here.

Dear SQL DBA,

Whenever I create a complex view, the first time I use it in a query, the query is slow. The same thing happens after I alter the view. What is the reason behind this?

This is a great question — because when you ask this question, you’re about to discover a lot of interesting, useful things about how SQL Server runs queries.

There are two ways that SQL Server can use memory to make a query faster the second time you run it. Let’s talk about how the magic happens.

1) It takes CPU time to figure out how to run a query. SQL Server uses memory to cache execution plans to save time the next time you run the query.

slow-then-fastThe first time you run a query using the view, SQL Server has to ‘compile’ an execution plan to figure out the best way to run the query.

SQL doesn’t compile the plan when you create the view– it compiles a plan when you first use the view in a query. After all, you could use the view in a query in different ways: you might select only some columns, you could use different ‘where’ clauses, you could use joins.

Secretly, it doesn’t matter too much that you’re using a view. When you run a query that references it, behind the scenes SQL Server will expand the TSQL in the view just like any other query, and it will look for ways to simplify it.

So SQL Server waits to compiles a plan for the exact query you run.

  • Depending on how many joins are in the view and how many ways SQL Server could run the query, it may take it a while to compile the query execution plan.
  • SQL Server tries to come up with a decent plan quickly. But I have seen some cases where query compile time took 5+ seconds, and query execution time was much smaller.

SQL Server is designed to store the execution plan for a query in memory in the execution plan cache, in case you run it again. It would be very costly for the CPUs to generate a plan for every query, and people tend to re-run many queries.

If you re-run a query and there is already an execution plain in the plan cache, SQL Server can use and save all that compile time.

When you alter a view, this will force SQL Server to generate a new execution plan the first time a query uses the view afterward. Something has changed, so SQL Server can’t use any plans that were in cache before.

Restarting the SQL Server, taking the database offline and online, memory pressure, and many server level settings changes will also clear execution plans out of cache, so you have to wait for a compile.

How much time are you spending compiling?

There are a few ways to see this:

  • If you are testing and want to see how much time is spent on compiling, you can run in your session: SET STATISTICS TIME ON; After that, SQL will show you the ‘parse and compile time’ for each query in your ‘Messages’ window for that session.
  • If you’re looking at execution plans, ‘compile time’ is stored in the XML. You can see it in the properties on the upper left-most operator. It’s reported in milliseconds and is the same as the ‘elapsed time’ that appears under parse and compile time from SET STATISTICS TIME.

compile-time-in-execution-plan

  • Query Store collects compilation time statistics, including compile duration. You can see some of the details in this post I wrote on Query Store and recompile hints.

Views aren’t really a problem. Sometimes, lots of joins are a problem, but SQL Server still has tricks to compile a plan quickly.

When people use complex views, particularly nested views, they often end up with a LOT of joins in each query.

When SQL Server has a lot of joins, optimization gets harder. There’s a ton of ways the query would be executed.

The SQL Server optimizer doesn’t want you to wait a long time while it looks at every single thing it could possibly do. It takes some shortcuts. It wants to get to a good plan fast.

Generally, SQL Server tries to keep optimization times short, even when you have a lot of joins.

But there are cases where sometimes compilation takes longer than normal.

What if you see multiple seconds of parse and compile time?

Usually compilation time is a number of milliseconds, but I have seen some cases where it’s seconds. This could be for a few reasons:

  1. SQL Server had to wait for CPU when it was trying to compile the query. It’s not the query’s fault, there just weren’t processors available. I would look at SQL Server’s wait statistics to identify this. If there were notable SOS_SCHEDULER_YIELD waits in that time period, the issue is more server load than the specific query.
  2. You’ve hit an identified bug for compile time. Some cumulative updates for SQL Server fix long compilation times. It’s worth looking at the version of SQL Server you’re on, setting up a fully patched test instance, and checking if applying updates might improve your compilation times.
  3. You’ve hit an unidentified bug for compile time. SQL Server works pretty hard to compile plans quickly, so multi-second compile time usually looks like a bug to me, if it’s not due to server load. For views and complex queries, I would:
    • Look to see if I could simplify the query where-ever possible, to make things easier for the optimizer.
    • Check if indexes might simplify the plan. Good indexes can make choosing a good plan easier and faster.
    • Try query hints as a last resort. The problem is that hints are really directives, and force a behavior — what’s beneficial to force today may not be so great after the next upgrade, or even  if my data sizes change over time.

2) It takes time to read data from disk. SQL Server uses memory to cache data in the Buffer Pool so it doesn’t have to go to disk the next time you use that data.

There are more reasons that the second run of a query might be faster.

The first time you run the query it may be using data that is on disk. It will bring that into memory (this memory is called the “buffer pool”). If you run the query again immediately afterward and the data is already in memory, it may be much faster — it depends how much memory it had to go read from disk, and how slow your storage system is.

When you are testing, you can see if your query is reading from disk (physical reads and read ahead reads) by running: SET STATISTICS IO ON;

One difference with this type of memory is that your buffer pool memory is not impacted by ALTERING the view. SQL Server does not dump data from the buffer pool when you change a view or procedure. Instead, it keeps track of how frequently different pages of data are used, and ages out the least useful pages from memory over time.

So this might be part of the answer in your case, but it wouldn’t necessarily explain why the query would be slower on the first run after an ALTER — unless the data pages that you’re using just hadn’t been queried a while and were no longer in the buffer pool cache by chance.

Takeaways for testing queries

I usually tune queries with a “warm cache”

Most queries that reference commonly used tables have a good chance of the data they need being in cache.

To test against a warm cache, I run the query once, and don’t pay a ton of attention to that first run.

I run the query again and measure the duration with the execution plan cached and the data pages in the buffer pool.

You can tune with a “cold cache”, but be careful

If I have a reason to believe that the data won’t be in cache when the query is run, then I will test it against a “cold cache”. I might need to do this if it’s a nightly query that runs, and the data it uses isn’t relevant at all to the daytime workload– so the pages are likely to not be in the buffer pool when it’s time for the job to run that night.

To test against a cold cache, you have to do some things that are NOT friendly for a production server — so make sure you only use this approach against test instances:

  • Run DBCC DROPCLEANBUFFERS to flush unmodified pages from the buffer pool. (This will slow down your whole instance because EVERYBODY then has to start reading from disk)
  • If modifications have been happing as part of the test, run a manual CHECKPOINT to flush dirty pages to disk from the buffer pool
  • Use RECOMPILE hints or a variant of DBCC FREEPROCCACHE to make sure I measure compilation time each time