MAXDOP of Confusion (Dear SQL DBA Episode 8)

on July 14, 2016

Learn how to configure the Max Degree of Parallelism and Cost Threshold for Parallelism settings in SQL Server - and how SQL Server 2014 SP2 and SQL Server 2016 change the way that SQL Server automatically configures some SQL Servers with lots of cores.

This is a “listen-able” 20 minute video. Show notes with clickable links are below the video.

Dear SQL DBA…

I am completely confused as to how to set Max Degree of Parallelism for an OLTP workload. Having looked at three recommendations recently and applied it to my own machine I get 3 different values. My machine has 1 physical CPU with 4 cores, 4 visible schedulers and a hyperthreading ratio of 4. However I’ve got recommendations to set either to 1, 2 or 4. What should it be?

Sincerely,

Max Degree of Confusion

I don’t blame you for being confused– this is a tough one!

The good news is that for Max Degree of Confusion’s specific question, I’ve got a clear recommendation for a default setting for “Max Degree of Parallelism” and “Cost Threshold for Parallelism”. I think you need to set both, and I’ll explain why.

But for people who have a lot more cores in their servers, things are a little more interesting– especially if you’re running SQL Server 2014 SP2+ or SQL Server 2016.

Let’s break this down and talk about how to figure out the setting, then we’ll circle back to our 4 core example.

Settings: Max Degree of Parallelism (“MAXDOP”) and Cost Threshold for Parallelism

When you run a query, SQL Server estimates how “expensive” it is in a fake costing unit, let’s call it Estimated QueryBucks.

If a query’s Estimated QueryBucks is over the “Cost Threshold for Parallelism” setting in SQL Server, it qualifies to potentially use multiple processors to run the query.

The number of processors it can use is defined by the instance level “Max Degree of Parallelism” setting.

When writing TSQL, you can specify maxdop for individual statements as a query hint, to say that if that query qualifies to go parallel, it should use the number of processors specified in the hint and ignore the server level setting. (You could use this to make it use more processors, or to never go parallel.)

KB 2806535 helps determine Max Degree of Parallelism

Hooray, Microsoft has published some guidance on this!

KB 2806536 is titled Recommendations and guidelines for the “max degree of parallelism” configuration option in SQL Server.

An acronym: NUMA nodes

KB 2806535 explains that you need to determine two things about your hardware

How many NUMA nodes it has
How many logical processors are in each NUMA node

NUMA - simpler than it sounds

NUMA means “Non-Uniform Memory Access.” (That doesn’t really explain much of anything, I know, but if I didn’t tell you what it stands for it would be weird.)

When you buy a modern server, typically each physical CPU has many logical processors. Let’s say we buy a server with 1 physical CPU and 10 logical processors, and the server has 256GB of RAM. That 1 physical CPU is snuggled up right next to all the memory chips, and it’s really fast for all 10 logical processors to access that 256GB of RAM. Our server has one NUMA node.

But what if we bought a server with 2 physical CPUs and 10 logical processors each, and 512GB of RAM? We would then have 2 NUMA nodes, because a NUMA node is just a physical CPU and its local memory. Each NUMA node would have 10 logical processors and 256GB of RAM.

Logical processors can access all of the memory in the server. It’s just faster for a processor to access the memory that’s hooked up to its own “NUMA node”.

This is important to SQL Server, because it wants queries to be fast.

If a query goes parallel, you want it to use processors from the same NUMA node and access memory local to that node (ideally).

8 is a magic number

The guidance in KB 2806535 is basically this:

Figure out how many logical processors you have in a NUMA node
If you have 8 or more logical processors in a NUMA node, generally you’ll get the best performance at maxdop 8 or lower
If you have less than 8 logical processors per NUMA node, generally you’ll get the best performance setting maxdop to the number of logical processors or lower

Why 8?

It’s not a law or anything– sometimes you can get better performance for a query with a maxdop higher than 8. And if that works out well for your workload, that’s cool!

But in general, using more cores = more overhead to pull everything back together.

8 may be less magical in SQL Server 2014 SP2 and SQL Server 2016 because of “Automatic Soft NUMA”

Hardware manufacturers are packing more and more cores in processors. SQL Server’s making some changes to scale with this.

SQL Server 2014 SP2 and SQL Server 2016 have a feature called “Automatic Soft NUMA”…

This feature is on by default in SQL Server 2016, but can be disabled using ALTER SERVER CONFIGURATION with the SET SOFTNUMA argument
In SQL Server 2014 SP2, you can enable Automatic Soft NUMA configuration by turning on Trace Flag 8079 at the server level

When Automatic Soft NUMA is enabled, if you have more than 8 physical processors in a NUMA node, Soft NUMA will be configured when SQL Server starts up. If you’re running SQL Server in a VM, note that the hypervisor generally presents all virtual cores to the guest as physical cores– whether or not you have hyperthreading enabled on the host server– so this will kick in if you have more than 8 vCPUs.

Messages are written to the SQL Server Error log when this occurs, so it’s very easy to check there at the time of the latest startup for information about what occurred. You can also query the sys.dm_os_sys_info and sys.dm_os_nodes dynamic management views for configuration information.

Bob Dorr explains more about Automatic Soft NUMA configuration in his blog post, “SQL 2016 - It Just Runs Faster: Automatic Soft NUMA” on the “SQL Server According to Bob” blog.

Bob gives an example of a workload running on 2016 where a 30% gain in query performance was obtained by using Soft NUMA with “max degree of parallelism” set to the number of physical cores in a socket– which was 12 in that case.

Fine tuning MAXDOP and Cost Threshold require a repeatable workload

If you really care about performance, you need a repeatable benchmark for your workload. You also need to be able to run that benchmark repeatedly on the production hardware with different settings.

This is one of the many reasons that performance-critical environments buy identical hardware for staging environments.

So what to do with 1 NUMA node and 4 logical processors?

OK, so back to Max Degree of Confusion’s question.

We know that there is 1 physical CPU. That’s one NUMA node. It was 4 logical processors. So we want 4 or lower.

Max Degree of Confusion said that this is an OLTP workload, which means we can have concurrent queries running. That’s a good argument for not using 4 – one longrunning query using all 4 logical processors isn’t going to be a nice day for lots of chatty little queries.

Really, the question in this situation is whether we want to go with maxdop 1 an effectively disable parallelism, or go with maxdop 2 and and have some parallelism.

I would personally start with:

Max Degree of Parallelism set to 2
Cost Threshold for Parallelism set to 50

Wait a second, the KB doesn’t talk about Cost Threshold for Parallelism!

I know, that’s what I’d change about the KB.

Remember, there’s two parts to going parallel:

Is the query’s Estimated cost is over the “Cost Threshold for Parallelism”
If so, how many logical processors is it allowed to use based on the “Max Degree of Parallelism”

SQL Server’s default “Cost Threshold for Parallelism” is 5. A cost of 5 QueryBucks is a super low bar these days.

This default was set back in days when processor power was a LOT MORE SCARCE. Processors have gotten way faster and you can eat a lot of data pretty quickly with a single logical processor these days.

When I was trained as a DBA back on SQL Server 2005, our standard was to raise Cost Threshold to 50 on every server.

11 years later, that has only become less risky. I think it’s a pretty safe default now.

This isn’t a law any more than the magic 8 was a law. It’s just a generalization based on observation.

Would you ever set Max Degree of Parallelism to 1 and disable parallelism?

Sure, if the application was carefully crafted to NEVER need parallelism unless a query hints it higher, maxdop 1 is the way to do that at the server level. Sharepoint is famous for this architecture.

But generally parallelism is a good thing, and you want to allow parallelism for the queries that need it, and benefit for it.

“Cost threshold for parallelism” is your setting for determining which queries “need” it, based on their estimated cost.

Links to learn more

Brent Ozar talks more about these two settings and brings in SQL Server wait stats with his post on CXPACKET
Paul White gave an excellent presentation on parallel query execution at the PASS Summit in 2013. It’s still just as good as when he first presented it. Watch the hour here.
Don’t forget to check out the SQL Server According to Bob blog, by Bob Dorr and Bob Ward of Microsoft. They’ve got that article on Automatic Soft NUMA Configuration and much more cool stuff.
Arvind Shyamsundar wrote a great post on parallel insert in SQL Server 2016 with a comparison of performance at different maxdops. Read it here.