MAXDOP of Confusion (Dear SQL DBA Episode 8)

Learn how to configure the Max Degree of Parallelism and Cost Threshold for Parallelism settings in SQL Server – and how SQL Server 2014 SP2 and SQL Server 2016 change the way that SQL Server automatically configures some SQL Servers with lots of cores.

This is a “listen-able” 20 minute video. Prefer a podcast instead? Find it at Show notes with clickable links are below the video.


I am completely confused as to how to set Max Degree of Parallelism for an OLTP workload. Having looked at three recommendations recently and applied it to my own machine I get 3 different values. My machine has 1 physical CPU with 4 cores, 4 visible schedulers and a hyperthreading ratio of 4. However I’ve got recommendations to set either to 1, 2 or 4. What should it be?


Max Degree of Confusion

I don’t blame you for being confused– this is a tough one!

The good news is that for Max Degree of Confusion’s specific question, I’ve got a clear recommendation for a default setting for “Max Degree of Parallelism” and “Cost Threshold for Parallelism”. I think you need to set both, and I’ll explain why.

But for people who have a lot more cores in their servers, things are a little more interesting– especially if you’re running SQL Server 2014 SP2+ or SQL Server 2016.

Let’s break this down and talk about how to figure out the setting, then we’ll circle back to our 4 core example.

Settings: Max Degree of Parallelism (“MAXDOP”) and Cost Threshold for Parallelism

When you run a query, SQL Server estimates how “expensive” it is in a fake costing unit, let’s call it Estimated QueryBucks.

If a query’s Estimated QueryBucks is over the “Cost Threshold for Parallelism” setting in SQL Server, it qualifies to potentially use multiple processors to run the query.

The number of processors it can use is defined by the instance level “Max Degree of Parallelism” setting.

When writing TSQL, you can specify maxdop for individual statements as a query hint, to say that if that query qualifies to go parallel, it should use the number of processors specified in the hint and ignore the server level setting. (You could use this to make it use more processors, or to never go parallel.)

KB 2806535 helps determine Max Degree of Parallelism

Hooray, Microsoft has published some guidance on this!

KB 2806536 is titled Recommendations and guidelines for the “max degree of parallelism” configuration option in SQL Server

An acronym: NUMA nodes

KB 2806535 explains that you need to determine two things about your hardware

  • How many NUMA nodes it has
  • How many logical processors are in each NUMA node

NUMA – simpler than it sounds

NUMA means “Non-Uniform Memory Access.” (That doesn’t really explain much of anything, I know, but if I didn’t tell you what it stands for it would be weird.)

When you buy a modern server, typically each physical CPU has many logical processors. Let’s say we buy a server with 1 physical CPU and 10 logical processors, and the server has 256GB of RAM. That 1 physical CPU is snuggled up right next to all the memory chips, and it’s really fast for all 10 logical processors to access that 256GB of RAM. Our server has one NUMA node.

But what if we bought a server with 2 physical CPUs and 10 logical processors each, and 512GB of RAM? We would then have 2 NUMA nodes, because a NUMA node is just a physical CPU and its local memory. Each NUMA node would have 10 logical processors and 256GB of RAM.

Logical processors can access all of the memory in the server. It’s just faster for a processor to access the memory that’s hooked up to its own “NUMA node”.

This is important to SQL Server, because it wants queries to be fast.

If a query goes parallel, you want it to use processors from the same NUMA node and access memory local to that node (ideally).

8 is a magic number

The guidance in KB 2806535 is basically this:

  • Figure out how many logical processors you have in a NUMA node
  • If you have 8 or more logical processors in a NUMA node, generally you’ll get the best performance at maxdop 8 or lower
  • If you have less than 8 logical processors per NUMA node, generally you’ll get the best performance setting maxdop to the number of logical processors or lower

Why 8?

It’s not a law or anything– sometimes you can get better performance for a query with a maxdop higher than 8. And if that works out well for your workload, that’s cool!

But in general, using more cores = more overhead to pull everything back together.

8 may be less magical in SQL Server 2014 SP2 and SQL Server 2016 because of “Automatic Soft NUMA”

Hardware manufacturers are packing more and more cores in processors. SQL Server’s making some changes to scale with this.

SQL Server 2014 SP2 and SQL Server 2016 have a feature called “Automatic Soft NUMA“…

  • This feature is on by default in SQL Server 2016, but can be disabled using ALTER SERVER CONFIGURATION with the SET SOFTNUMA argument
  • In SQL Server 2014 SP2, you can enable Automatic Soft NUMA configuration by turning on Trace Flag 8079 at the server level

When Automatic Soft NUMA is enabled, if you have more than 8 physical processors in a NUMA node, Soft NUMA will be configured when SQL Server starts up. If you’re running SQL Server in a VM, note that the hypervisor generally presents all virtual cores to the guest as physical cores– whether or not you have hyperthreading enabled on the host server– so this will kick in if you have more than 8 vCPUs.

Messages are written to the SQL Server Error log when this occurs, so it’s very easy to check there at the time of the latest startup for information about what occurred. You can also query the sys.dm_os_sys_info and sys.dm_os_nodes dynamic management views for configuration information.

Bob Dorr explains more about Automatic Soft NUMA configuration in his blog post, “SQL 2016 – It Just Runs Faster: Automatic Soft NUMA” on the “SQL Server According to Bob” blog.

Bob gives an example of a workload running on 2016 where a 30% gain in query performance was obtained by using Soft NUMA with “max degree of parallelism” set to the number of physical cores in a socket– which was 12 in that case.

Fine tuning MAXDOP and Cost Threshold require a repeatable workload

If you really care about performance, you need a repeatable benchmark for your workload. You also need to be able to run that benchmark repeatedly on the production hardware with different settings.

This is one of the many reasons that performance-critical environments buy identical hardware for staging environments.

So what to do with 1 NUMA node and 4 logical processors?

OK, so back to Max Degree of Confusion’s question.

We know that there is 1 physical CPU. That’s one NUMA node. It was 4 logical processors. So we want 4 or lower.

Max Degree of Confusion said that this is an OLTP workload, which means we can have concurrent queries running. That’s a good argument for not using 4 — one longrunning query using all 4 logical processors isn’t going to be a nice day for lots of chatty little queries.

Really, the question in this situation is whether we want to go with maxdop 1 an effectively disable parallelism, or go with maxdop 2 and and have some parallelism.

I would personally start with:

  • Max Degree of Parallelism set to 2
  • Cost Threshold for Parallelism set to 50

Wait a second, the KB doesn’t talk about Cost Threshold for Parallelism!

I know, that’s what I’d change about the KB.

Remember, there’s two parts to going parallel:

  1. Is the query’s Estimated cost is over the “Cost Threshold for Parallelism”
  2. If so, how many logical processors is it allowed to use based on the “Max Degree of Parallelism”

SQL Server’s default “Cost Threshold for Parallelism” is 5. A cost of 5 QueryBucks is a super low bar these days.

This default was set back in days when processor power was a LOT MORE SCARCE. Processors have gotten way faster and you can eat a lot of data pretty quickly with a single logical processor these days.

When I was trained as a DBA back on SQL Server 2005, our standard was to raise Cost Threshold to 50 on every server.

11 years later, that has only become less risky. I think it’s a pretty safe default now.

This isn’t a law any more than the magic 8 was a law. It’s just a generalization based on observation.

Would you ever set Max Degree of Parallelism to 1 and disable parallelism?

Sure, if the application was carefully crafted to NEVER need parallelism unless a query hints it higher, maxdop 1 is the way to do that at the server level. Sharepoint is famous for this architecture.

But generally parallelism is a good thing, and you want to allow parallelism for the queries that need it, and benefit for it.

“Cost threshold for parallelism” is your setting for determining which queries “need” it, based on their estimated cost.

Want to learn more? Bookmark these resources:

  1. Brent Ozar talks more about these two settings and brings in SQL Server wait stats with his post on CXPACKET
  2. Paul White gave an excellent presentation on parallel query execution at the PASS Summit in 2013. It’s still just as good as when he first presented it. Watch the hour here.
  3. Don’t forget to check out the SQL Server According to Bob blog, by Bob Dorr and Bob Ward of Microsoft. They’ve got that article on Automatic Soft NUMA Configuration and much more cool stuff.
  4. Arvind Shyamsundar wrote a great post on parallel insert in SQL Server 2016 with a comparison of performance at different maxdops. Read it here.


, , , , ,

35 Responses to MAXDOP of Confusion (Dear SQL DBA Episode 8)

  1. Rob July 14, 2016 at 11:24 am #

    Thanks Kendra, it is always good to hear this subject as frequently as possible and especially how it relates to the newer versions of Sql.

  2. T Alben July 14, 2016 at 1:57 pm #

    Thank you for this post. Very nice. Reads like a fairy tail. Once upon a time there was a Numa node, and it had four logical nodes. Their names were …

  3. James July 15, 2016 at 8:14 am #

    Hello Kendra,

    You did a great job explaining what the max dop should be. Would you recommend the the max dop calculator

    I have 24 CPU’s
    2 NUMA Nodes
    6 number of processors per core

    And it recommends 3 as my MAX DOP, I feel that is a low number

    • Kendra Little July 15, 2016 at 10:32 am #

      This is a fun one! I would personally tend to choose between 6, 4, and 2 based on the info you have– but to decide I’d look at what the workload type is.

      Is it a data warehouse workload with mostly reports running? If so, I’d be biased to starting around 6.

      Is it an OLTP workload or a mixed workload? If so, I’d look at the batch requests/sec, and start more at 4 or at 2.

      Before changing the setting, baselining wait stats as well as batch requests /sec over a few days is helpful. If you go to a higher maxdop, you don’t want to see waits like SOS_SCHEDULER_YIELD get worse, indicating that you’ve got contention and that queries are spending a lot of time waiting to get access to a CPU.

      As for why I wouldn’t consider 3, I’m a little old school, and have been biased against odd numbers for years. Linchi Shea wrote a post a while back showing some tests he ran where maxdop 5 was slower than maxdop 4 with an example workload, but at this point the hardware and SQL Server setup aren’t at all current (and I’m not finding a link to the post anyway). So I wonder if my old bias has any grounding in fact anymore at all. It would be an interesting thing to study, it’s on my long term list of things I’d like to experiment with given the right hardware, workload, and available time 🙂

  4. Rudyx July 26, 2016 at 12:43 pm #

    Been following the rules since SQL 2000 – as for cost 30 seems to work well too.
    This article seems a good dove-tail into tempdb file allocation as well too!

  5. Kiran August 27, 2016 at 8:07 pm #

    Hello Kendra,

    Really Good Post.
    Below post is really helpful too,

    We have Datawarehouse server with below Configuration,

    Intel Xeon 2.40Ghz 8 Processors
    160 Logical Processors
    8 NUMA node
    20 Logical processors per NUMA node
    2TB RAM

    Current setting :- MAXDOP :- 8 and Cost Threshold of Parallelism = 20.

    What is your suggestion?

    Also, we have 80 instances of Mix workload with below Configuration,

    Intel Xeon 2.13Ghz (2 Processors)
    32 Logical Processors
    2 NUMA Node
    16 Logical processor per NUMA node
    512GB RAM

    Current setting :- MAXDOP = 0 and Cost thershold of parallelism = 50

    What is your suggestion?

    Thanks in Advance.


    • Kendra Little August 30, 2016 at 10:50 am #

      Hey there,

      The 8 socket server is definitely a special beast that will require custom tuning and testing on that server itself. That one’s not guess-able– different hardware has different speeds at doing cross-socket memory access, so it’s really going to vary by your query workload AND your server hardware.

      For your mixed workload servers, the basic recommendation would be to start at maxdop 8 and tune from there on each instance.

      • kiran August 31, 2016 at 5:17 pm #

        Thanks Kendra for your suggestion.

        Really appreciated.

        One more question, in which scenario MAXDOP set to 1?

        • Kendra Little September 5, 2016 at 9:18 am #

          Setting maxdop to 1 at the server level disables parallelism by default. Queries can only go parallel if they use a hint to set the degree of parallelism to multiple cores, and otherwise qualify (estimated cost is over the server cost threshold setting, etc).

          Parallelism is typically helpful to expensive queries, so you generally only want to disable it when the application has been specifically designed as a single threaded application and is always tested single threaded. The most famous example of an application designed this way is SharePoint.

  6. Adrian September 10, 2016 at 8:57 am #

    Hi Kendra,

    Thank you for this post, very useful.

    Would the same rules apply to a Virtual Machine?

    I have a SQL Server 2005 VM running on Hyper-V with the following spec.
    8 vCPUs
    128GB RAM
    3 NUMA nodes.

    Each Hyper-V host has the following spec:
    4 CPUs using Hyper threading with 8 cores in each.
    256GB RAM.

    What would you recommend setting MAXDOP to? It is a purely OLTP workload.

    We are already spanning multiple NUMA nodes due to the amount of RAM. We can’t increase the number of vCPUs due to it being at the maximum for the OS. I did think about slightly reducing the amount of RAM used by the VM to see if it would use 2 NUMA nodes instead of 3.

    • Kendra Little September 19, 2016 at 9:32 am #

      In general, you want the SQL Server NUMA setup to “align” / fit within the host server NUMA nodes to avoid foreign memory access.

      Hyper-v does have some counters that let you see how well your VM aligns with the host and how much foreign memory access is happening:

      I gotta be honest, though — a three NUMA node setup with 8 cores which would be split into groups of different sized cores (3, 3, and 2? Not sure) on an unsupported version of SQL Server isn’t something that I’d try to tune for performance.

      If performance is important, I’d work on getting to the latest and greatest versions of SQL Server and the hyper-visor as priority 1, and then work on applying best practices from that point down. Both Hyper-V and SQL Server’s NUMA performance have evolved over time, and documentation and options for the most recent stuff is better now.

  7. Scott September 14, 2016 at 8:19 am #

    Hi Kendra,

    So how would you apply this towards virtual SQL servers where CPU sockets is rather abstract?

    • Kendra Little September 19, 2016 at 8:41 am #

      Hi Scott,

      The thing I remember is that the CPU sockets are not abstract to the SQL Server. In a VM, it sees the virtual sockets presented to it and it goes on those. Hypervisor manufacturers are aware of this– for example, VMWare has a best practice guide for SQL Server and discusses how to configure the sockets and NUMA within it:

      It does require VERY careful reading though because of licensing concerns with SQL Server! One of the keys that it discusses is alignment- “If your workload requires more than one core per socket for licensing purposes, or for any other reason, make sure it is aligned with the underlying physical NUMA topology.”

      In other words, the virtual NUMA nodes you define should fit within / align within the NUMA nodes of the host servers.

  8. Brian November 3, 2016 at 8:14 am #

    Hi Kendra,

    I’m trying to help out a friend that currently has a SQL 2008 R2 VM running on a VMware ESXi 5.0 host. The host is an older box that has 2 CPU’s with 4 cores each. No hyper threading available.

    The SQL VM is currently configured with 3 vCPU’s (3 separate sockets, 1 core each). What would be the recommended MAXDOP?

    Any insight would be appreciated, I think the odd number of vCPU’s is throwing me off.


    • Kendra Little November 3, 2016 at 8:55 am #

      It’s pretty uncommon to give SQL Server an odd number of CPUs, like 3. You’re allowed to do it, but you won’t find many rules of thumbs for configuring maxdop for that specifically. And you’ll find that a lot of us DBAs are a bit wary of it because of old issues like this one, where SQL Server wouldn’t even install if you had an odd number of CPUs:

      So if it was me, I’d be happier with 4 vCPUs and would consider maxdop at 4, 2, or 1, depending on workload.

  9. Pedro Oliveira November 18, 2016 at 7:23 am #

    Hi Kendra,

    Great video… but I’d like to make it more complex, if I may, with hyper-threading….
    I’ve a CPU with 8 physical cores and HT (16 logical cores)..
    SQL NUMA displays 16 logical cores and 16 scheduler count…
    I’ve this set to MAXDOP to 6 since I’ve lots of batches per second.. In this case there’s not much to say..
    But I also have one with 4 cores, 8 logical…
    If I set MAXDOP to 8 or less, say 6, won’t SQL Server use HT cores that are much slower than physical ones?!
    If a query takes 60secs single threaded and SQL splits into 2 threads, one physical and one HT (being HT 30% of physical – I think), one core will take 30secs but the HT core will take 100secs… It’s slower…
    Shouldn’t MAXDOP be the maximum physical cores and not logical cores?!


    • Kendra Little November 18, 2016 at 2:31 pm #

      I think you’ve got some very old numbers from somewhere — hyperthreading on modern processors doesn’t make one thread 70% slower than the other. I have had a couple of clients who had read similar older information who had two-node failover clusters. When they set hyperthreading ‘on’ for one node and ‘off’ for the other, and their workloads performed as well or better on the nodes with hyperthreading. (But of course I haven’t tested every processor or every workload.)

      If you don’t want to use hyperthreading, the way to do that is to turn it off at the processor level / BIOS settings. SQL Server will use the logical processors you give it.

  10. Udhayaganesh November 28, 2016 at 9:05 am #

    Hi please guide me . I have configured maximum degree of. Parallelism is 8. For below configured.

    Dual processor
    Virtual machine
    12 CPU.


    • Kendra Little November 28, 2016 at 11:59 am #

      You’re in the right place– read this post!

  11. Greg A. January 5, 2017 at 5:37 am #

    Hi Kendra,

    I really enjoyed this article and it, and the links you included, helped clear up a lot of things for me.

    I would just like to add that Soft NUMA will not be engaged if Hyper Threading is enabled and brings the logical processor count over 8. They are not considered physical cores. An 8 core processor with HT will present 16 logical processors but the system knows it’s only 8.

    If some out there have looked in their logs for the Auto Soft NUMA messages and can’t see them they may want to check if HT has bumped up their count.

    • Kendra Little January 5, 2017 at 12:58 pm #

      Great point, thank you! I updated the paragraph on auto soft NUMA to this: “When Automatic Soft NUMA is enabled, if you have more than 8 physical processors in a NUMA node, Soft NUMA will be configured when SQL Server starts up. If you’re running SQL Server in a VM, note that the hypervisor generally presents all virtual cores to the guest as physical cores– whether or not you have hyperthreading enabled on the host server– so this will kick in if you have more than 8 vCPUs.”

      I can’t think of a hypervisor that currently exposes any information about hyperthreading to a guest. But I wouldn’t be surprised to be wrong, either, that seems like something that might change.

  12. BeGrateful January 5, 2017 at 2:33 pm #

    Hi Kendra,
    Thank you for your clear and detail explanation. I really love your style of teaching. We currently have a DB server on 2016 and 1 cpu socket with 4 core + hyperthreading. This server is for a OLTP workload. MS recommended that we set our MaxDOP to 8, i was going toward setting MaxDOP to 4 instead. Also Cost Threshold for Parallelism set default to 5, i am planning to set it to 50. What would you recommend?

    Also, in term of getting a baseline or benchmark, what would be your approach on this.

    • Kendra Little January 6, 2017 at 7:34 am #

      Thanks for the kind words! This is a great question, and I will write a whole post on it, but here are some notes:

      I am pretty comfortable starting with cost threshold set to 50 and then tweaking from there, probably because that was the standard my team used way back in 2005 and it worked well for our queries then. The main thing is that I’d change that at a different time than changing the maxdop settings, in case you have some important queries that get slower from the cost threshold change.

      For maxdop on an existing server, if I want to lower it, generally I’ll do it somewhat gradually– say go to 6, let it run for a week, then 4.

      For a benchmark, you’ve got a few options. You can use replay, but it’s time consuming to set up and you want to run the replay on an identical hardware setup, which not everyone has. Great if you can dedicate the resources and get it to work.

      If you don’t have replay, before starting I’d identify top queries over a week (ideally longer) in a few categories: by execution count (most frequent queries), by average duration (longest queries), by total CPU usage (most processor consuming), by total reads (biggest IO consumers). You can use these as a benchmark — if they are parameterized queries, though, and they usually are, you have to do extra legwork to figure out what parameters to execute them with, plus what harness to use. Silver lining is that this set of queries becomes incredibly useful for further changes, but it’s a good amount of work.

      If you’re not benchmarking, you at minimum want to identify these queries so you can observe their performance after you make each change.

      Options for identify/ observing top queries are execution plan cache (may miss things, but fast – free scripts out there on the internet), monitoring software that catches query duration, and/or Query Store (2016+).

      Hope this helps!

  13. Michael Villegas January 27, 2017 at 11:29 am #

    Hey Kencra, Great post, excellent advise, it is really helpful. I have one question though, the default value of the “max degree of parallelism” is 0 (zero), how bad is to leave it as is, I know that the KB recomends some nombers based on the CPU specs, but if it remains as 0, would it be harmful for the overall server performance?

    • Kendra Little January 27, 2017 at 11:34 am #

      0 means “use all the cores”. I have seen quite a few cases where that setting was slowing things down, and queries measurably sped up quite a bit when the setting was changed.

      That’s not true in all cases, though. In other cases, changing the setting doesn’t make a huge difference. It varies by the hardware and the workload.

  14. Rafael Placido February 27, 2017 at 10:25 pm #

    Hello Kendra

    I love your blog, I have a question about the Automatic Soft NUMA I read the article I found a little bit tricky about how they noticed the increase of performance in the sql server, how they got that metrics I want to the same with my SQL Server 2014 but I not sure how to make a metric about how is the performance in this momement before make the change, even If I have a bad configuration that the vendor recommended.

    I give a little picture about my server I have 4 numa nodes with 16 cpu each one and the max degree of parelelism is 16 (big mistake but somebody recommeded like that and the client want ot that way, I think should be 8 like in the kb article told about it) with cost threoldhold with 80. I not sure how to make an initial metric with the queries or with another dmv?

    • Kendra Little February 28, 2017 at 8:03 am #

      The only way to know how a workload will perform under a new maxdop is to test the whole workload. You can make some guesses based on wait statistics ahead of time, but you won’t know.

      You may be able to pick out the longest/ most important queries from dynamic management views and test those under a different maxdop using query hints, but it’s going to be tough with queries that make modifications. And you still won’t know how a global settings change will impact all the other queries in the environment.

      When performance is important, teams have identical or near-identical hardware to test against outside of production, and load tests representative of their workload. One option is to use the replay features for this in SQL Server, just know that there is fairly significant time to get it set up and working and plan accordingly.

  15. Matt April 13, 2017 at 6:59 am #

    We are discussing MAXDOP settings and the differences between Standard and Enterprise Editions. Do you know what the limitations are for Standard Editions in 2014\2016 when it comes to configuration? We’ve heard different answers on this to date, even from MS.

    • Kendra Little April 13, 2017 at 8:47 am #

      There’s pretty straightforward documentation on the limits – just check out the documentation on “Compute Capacity Limits” for the version of SQL Server you’re looking for. Here’s the one for SQL Server 2016:

      If you’re getting different answers, the issue may be with terminology. The limits on compute power aren’t limits on maxdop per se– they’re limits on sockets/cores. Maxdop is a configuration setting applied to the SQL Server and isn’t the same as a “compute limit”, so if you’re asking about maxdop limits then I can see how you might get confusing answers.

      Does that help?

  16. dan May 5, 2017 at 11:03 am #

    what’s the maxdop should be if i have 2 numaNodes and online_scheduler_count 3 each (total 6 logical processors)? Thanks.

    • Kendra Little May 7, 2017 at 2:57 pm #

      The guidance in KB 2806535 is basically this:

      Figure out how many logical processors you have in a NUMA node
      If you have 8 or more logical processors in a NUMA node, generally you’ll get the best performance at maxdop 8 or lower
      If you have less than 8 logical processors per NUMA node, generally you’ll get the best performance setting maxdop to the number of logical processors or lower


  1. Parallelism Configuration Options – Curated SQL - July 18, 2016

    […] Kendra Little discusses max degree of parallelism and cost threshold for parallelism: […]

  2. Why an Upgrade can Cause Performance Regressions (Dear SQL DBA Episode 9) - by Kendra Little - July 21, 2016

    […] For more information on how to determine these two settings, check out my last episde, “Max Degree of Confusion.” […]

  3. Stupid Defaults – nate_the_dba - June 17, 2017

    […] alike.  But don’t ask me — there are plenty of people much smarter than I with blog posts on the topic.  If you put a gun to my head, for CTFP, I’d say “pick your […]

Leave a Reply