The story of the repeat blocker (4 minutes)


Our first blocking story is the case of the repeat blocker.

When I started out as a SQL Server database administrator, I was interested in performance from the start. I did lots of research on how to configure my SQL Server how to configure Windows, and how to get the best hardware to leverage SQL Server’s strengths and try to get consistent, great performance.

But it wasn’t very easy

Early on in my career, we started having an incident where customers were complaining about the application being slow. And science pointed to the problem being in the SQL Server.

The SQL Server, people thought, was freezing up

When I looked at my performance counters that I was collecting, at the times when the SQL Server “froze up”, the CPU usage looked normal, my disk latency counters looked normal too. So I couldn’t just blame the SAN team.

Sort of the cycle of life. The application tier is like, “Oh the problem’s in the SQL Server.”

And the SQL Server people say, “Oh, well, the problem is in the SAN.” I didn’t have any evidence to do that.

I also didn’t have any errors in my Windows log or my SQL Server log.

Looking at the performance counters, you would never know that anything had been going on in an unusual way.

But the problem kept happening. And what would happen was like this…

A customer would be working, in the online application. And they push a button, and the screen would freeze up in front of them.

They had to get their work done, so they would start a ticket with our helpdesk.

The helpdesk would look at the environment. They wouldn’t see anything unusual on the application servers.

They then engaged the database administrator on call. Who would log in, look at the SQL Server, and when they ran the built-in procedure sp_who2, well, it would get a lot of rows back.

This was a busy SQL Server

The built-in procedure sp_who2, it returned a lot of rows, in sort of a random order.

But it would show that one or more sessions was blocked by another session. Sp_who2 doesn’t give you a ton of detail about what sessions are doing in SQL Server, but it will show blocking.

The DBA on call would see this blocking, they would take a deep breath, they would try to feel hopeful, and they would kill off the blocker to let the block session go through.

After they did this, the customer would report that the screen unfroze and they were able to finish their work.

Killing a session is always scary

Now the DBA killing this session, this is always scary when you kill a session because there’s couple things. The session that you kill, whatever application issued that session is gonna have to be smart enough to catch that it failed and retry it and redo the work. Right? You’re gambling whenever you’re killing that the application is that smart.

There’s also the risk that the rollback could be long and painful and cause blocking while the rollback is in process. Rollback is mostly single-threaded and it can be really slow.

In these cases we were lucky ‘cause this particular blocker, who kept coming back, it hadn’t done a lot of work, and its rollback was fast so at least we were lucky in that sense but the problem kept happening.

We couldn’t find what had changed recently.

Was it this particular blocker application that had changed? What was the issue? We didn’t even know at first, who was it really who was doing the blocking?

We need more information

We need to figure out, what type of lock waits do we have? What queries are involved in this blocking?

And we also need to be more proactive, because using our customers as our monitoring system, doesn’t make the customers that happy

There are lots of free tools in SQL Server that will help us accomplish this. So let’s dive into management studio and take a look at how you can do all of these things to get alerted and find out tons of information about blocking.