Does Truncate Table Reset Statistics?

Short answer: the SQL Server optimizer will know that the table was truncated, but statistics might not update when you expect.

For the long answer, let’s walk through an example using the WideWorldImporters sample database. I’ll be using Trace Flag 3604 and 2363 to get SQL Server to print information about how it optimized my query out to the messages tab. (Thanks to Paul White for blogging about this trace flag.)

First, a fresh restore of WideWorldImporters

USE master;
GO

IF DB_ID('WideWorldImporters') IS NOT NULL
ALTER DATABASE WideWorldImporters SET OFFLINE WITH ROLLBACK IMMEDIATE

RESTORE DATABASE WideWorldImporters FROM DISK=
    'S:\MSSQL\Backup\WideWorldImporters-Full.bak'
    WITH REPLACE
GO

USE WideWorldImporters;
GO

Before we do anything, what do the statistics look like on Sales.OrderLines?

Here’s the query that I’m using to inspect the statistics:

SELECT 
    sp.last_updated,
    stat.name as stats_name,
    STUFF((SELECT ', ' + cols.name
        FROM sys.stats_columns AS statcols
        JOIN sys.columns AS cols ON
            statcols.column_id=cols.column_id
            AND statcols.object_id=cols.object_id
        WHERE statcols.stats_id = stat.stats_id and
            statcols.object_id=stat.object_id
        ORDER BY statcols.stats_column_id
        FOR XML PATH(''), TYPE
    ).value('.', 'NVARCHAR(MAX)'), 1, 2, '')  as stat_cols,
    sp.modification_counter,
    sp.rows,
    sp.rows_sampled
FROM sys.stats as stat
CROSS APPLY sys.dm_db_stats_properties (stat.object_id, stat.stats_id) AS sp
JOIN sys.objects as so on 
    stat.object_id=so.object_id
JOIN sys.schemas as sc on
    so.schema_id=sc.schema_id
WHERE 
    sc.name= 'Sales'
    and so.name='OrderLines'
ORDER BY 1 DESC
GO

Statistics were last updated on June 2, 2016. We’ll be mostly looking at the statistic on Quantity throughout the example, so I’ve highlighted it:

statistics-before-changes

Let’s run a query that loads the statistic on Quantity

Before we truncate the table, let’s take a peek into how SQL Server optimizes a query that cares about rows in Sales.OrderLines with Quantity > 10. I’m using trace flags 3604 and 2363 to make SQL Server print information about how it used statistics to optimize this to my messages tab.

SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
    OPTION
(
    QUERYTRACEON 3604,
    QUERYTRACEON 2363,
    RECOMPILE
)
GO

Here’s the info on the messages tab:

Begin selectivity computation

Input tree:

  LogOp_Select

      CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)

      ScaOp_Comp x_cmpGt

          ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)

Plan for computation:

  CSelCalcColumnInInterval

      Column: QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7

Selectivity: 0.44231

Stats collection generated: 

  CStCollFilter(ID=2, CARD=102356)

      CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)

End selectivity computation

Estimating distinct count in utility function

Input stats collection:

    CStCollFilter(ID=2, CARD=102356)

        CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)

Columns to distinct on:QCOL: [WideWorldImporters].[Sales].[OrderLines].OrderLineID


Plan for computation:

  CDVCPlanUniqueKey

Result of computation: 102356


(102035 row(s) affected)

Highlights: one of the first thing SQL thinks about is the number of rows in the table

Right at the beginning, we see: “CStCollBaseTable(ID=1, CARD=231412 TBL: Sales.OrderLines)”

That ‘CARD’ number is the optimizer thinking about how many rows are in this table. If you glance back up at the table statistics, the most recent statistic to be updated was on the ‘LastEditedWhen’ column. When that statistic was updated, there were 231,412 rows in the table.

SQL Server decides that it wants detail on the Quantity column to figure out how to run this query, so we see that it loads that statistic up to use: “Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7”

Alright, let’s truncate this table

I wipe out all the rows with this command:

TRUNCATE TABLE Sales.OrderLines;
GO

Now, I wouldn’t expect truncating the table to automatically update the statistics.

SQL Server updates statistics when they’re used to optimize a query — so if nobody queries this table for six months, I wouldn’t expect the stats to update for six months.

Let’s re-run our query, trace flags and all:

SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
    OPTION
(
    QUERYTRACEON 3604,
    QUERYTRACEON 2363,
    RECOMPILE
)
GO

The messages tab has less info this time- it’s much more concise!

Begin selectivity computation

Input tree:

  LogOp_Select

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

      ScaOp_Comp x_cmpGt

          ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)

Plan for computation:

  CSelCalcFixedFilter (0.3)

Selectivity: 0.3

Stats collection generated: 

  CStCollFilter(ID=2, CARD=1)

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

End selectivity computation


(0 row(s) affected)

SQL Server knows that we blew away all those rows

This time we see “CARD=1 TBL: Sales.OrderLines”

SQL Server doesn’t like to estimate 0 for empty tables. It likes to estimate 1. It knows this table is empty.

With this information, it chooses a different plan for computation. The plan doesn’t require looking at the quantity column this time– we don’t have any lines about that at all.

But the statistics don’t look any different

You might expect to see that the statistic on Quantity had updated. I expected it, before I ran through this demo.

But SQL Server never actually had to load up the statistic on Quantity for the query above. So it didn’t bother to update the statistic. It didn’t need to, because it knows that the table is empty, and this doesn’t show up in our column or index specific statistics.

To verify, I just rerun my metadata query above, and things look the same:

statistics-after-truncate-and-query 

What if the table has exactly one row?

Let’s insert one and find out:

INSERT INTO [Sales].[OrderLines] (OrderLineID, OrderID, StockItemID, Description, PackageTypeID, Quantity, UnitPrice, TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
     VALUES (1, 45, 164, '32 mm Double sided bubble wrap 50m', 7, 50, 112.00, 15.000, 50, '2013-01-02 11:00:00.0000000', 4, '2013-01-02 11:00:00.0000000')
GO

Now we run our familiar query, with all its merry trace flags:

SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
    OPTION
(
    QUERYTRACEON 3604,
    QUERYTRACEON 2363,
    RECOMPILE
)
GO

And here’s what SQL Server has to say about optimizing that…

Begin selectivity computation

Input tree:

  LogOp_Select

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

      ScaOp_Comp x_cmpGt

          ScaOp_Identifier QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

          ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=10)

Plan for computation:

  CSelCalcColumnInInterval

      Column: QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity

Loaded histogram for column QCOL: [WideWorldImporters].[Sales].[OrderLines].Quantity from stats with id 7

Selectivity: 1

Stats collection generated: 

  CStCollFilter(ID=2, CARD=1)

      CStCollBaseTable(ID=1, CARD=1 TBL: Sales.OrderLines)

End selectivity computation


(1 row(s) affected)

One row is enough to use our column statistic

Looking at the beginning, CARD=1 for Sales.OrderLines, just like it did after we truncated the table. But SQL Server does something different this time, indicating that it now knows that the table isn’t really empty.

It goes back to the CSelCalcColumnInInterval plan to optimize. And it loads up the column stat for the Quantity column.

Since this statistic was loaded into memory, it should have auto-updated based on my database settings. Sure enough, it did:

statistics-after-truncate-and-query-and-insert

SQL Server knows when you’ve truncate a table

And the fact that the table has been truncated may mean that it doesn’t need to use statistics on the table when optimizing queries. After all, it’s an empty table, so it can take shortcuts!

So don’t get too confused if statistics look way out of date for a truncated table. Instead, ask yourself, “why am I querying a truncated table?” (Related disclaimer: I only tested this on SQL Server 2016.)

Want to learn more about statistics in SQL Server? Start here.

 

 

 

Previous Post
When Did SQL Server Last Update That Statistic? How Much Has Been Modified Since? And What Columns are in the Stat?
Next Post
Index Create and Last Modified Dates in SQL Server

Related Posts

4 Comments. Leave new

Great post Kendra, really good read on a very interesting subject!

Another technique you could use is Trace Flag 8666 which mines the plan cache and shows hidden information like what statistic objects were used to compile an execution plan. You’d have to remove the RECOMPILE hint to use this method obviously and it won’t work in the scneario where you issue the query after truncating the table because SQL Server doesn’t cache trivial plans where no statistic objects were loaded by the QO (it does compile some trivial plans though in other scenarios).

I’ve tested the effect truncating tables has on statistics before and can confirm the behaviour is the same under the new cardinality estimator model versions (120 & 130) but the behaviour is slightly different under the legacy cardinality estimator model version 70. Under the legacy CE, it produces the same estimate of one row but it also loads the statistic object (_WA_Sys_00000006_05A3D694) and invalidates the statistic but does not update it. You can see this by running

SELECT *
FROM Sales.OrderLines
WHERE Quantity > 10
OPTION
(
QUERYTRACEON 9481, QUERYTRACEON 3604, QUERYTRACEON 9292, QUERYTRACEON 9204
)
GO

Note that in the messages tab, you will see the below and pay attention in particlar to “Empty Table = True”

Stats header loaded: DbName: WideWorldImporters, ObjName: Sales.OrderLines, IndexId: 7, ColumnName: Quantity, EmptyTable: TRUE

Stats loaded: DbName: WideWorldImporters, ObjName: Sales.OrderLines, IndexId: 7, ColumnName: Quantity, EmptyTable: TRUE

Also note that because the statistic object was loaded when the query was optimised under the legacy cardinality estimator that you will be able to use the Trace Flag 8666 this time even though the plan was trivial.

I’m going to link this post in my blog post below as it’s a great example that I wish I had included in mine originally, hope you don’t mind 🙂

https://sqlserverscotsman.wordpress.com/2016/11/10/statistics-cardinality-estimator-model-variations/

Reply

[…] The inclusion assumption dictates that we query for data that does exist and as such the cardinality estimator will (almost) never estimate zero rows.  Even if a query selects from a table that has been truncated, the query optimiser will still estimate one row despite the table being completely empty as pointed out by Kendra Little here. […]

Reply

Good to know. So SQL Optimizer will ignore what’s in the sys.stats table since it knows the table has been truncated, which you confirmed by running the DBCC commands.

Bottom line don’t rely on what’s in the table i.e. sys.stats all the time correct ?

Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Menu