Merging Boundary Points: Does a Changing Partition_Number Indicate Data Movement?

merging-rectangle

I received a question from a reader a while back who was testing out a partitioning architecture:

We are testing table partitioning using one filegroup per partition. When we merge a boundary point, we see that partition_number changes in sys.partitions. Does this mean that data movement is occurring?

Short Answer: Just because partition_number is changing does NOT mean there is “data movement”.

Better Answer: You can tell for sure whether data movement  is happening in a test environment by having a script that shows how much data is in each filegroup before and after you merge! This is worth doing so you know exactly what’s going on with your setup.

Let’s take a look at the ‘Better Answer’ here.

Our Test Partition Setup: the mergethis database

This script creates a testing database named merge this with an extra 10 filegroups. I’ve been irresponsible and put them on the C drive, but I bet you’re smarter than that and would never risk filling up your system drive.

SET NOCOUNT ON;
GO

use master;
GO

IF DB_ID('mergethis') IS NOT NULL
BEGIN
    ALTER DATABASE mergethis SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE mergethis;
END
GO

CREATE DATABASE mergethis
GO

ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_1;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_2;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_3;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_4;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_5;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_6;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_7;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_8;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_9;
GO
ALTER DATABASE mergethis
    ADD FILEGROUP fg_mergethis_10;
GO


ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_1, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_1.ndf') TO FILEGROUP fg_mergethis_1;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_2, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_2.ndf') TO FILEGROUP fg_mergethis_2;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_3, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_3.ndf') TO FILEGROUP fg_mergethis_3;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_4, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_4.ndf') TO FILEGROUP fg_mergethis_4;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_5, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_5.ndf') TO FILEGROUP fg_mergethis_5;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_6, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_6.ndf') TO FILEGROUP fg_mergethis_6;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_7, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_7.ndf') TO FILEGROUP fg_mergethis_7;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_8, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_8.ndf') TO FILEGROUP fg_mergethis_8;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_9, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_9.ndf') TO FILEGROUP fg_mergethis_9;
GO
ALTER DATABASE mergethis ADD FILE ( NAME = file_mergethis_10, FILENAME = 'C:\MSSQL\DATA\fg_mergethis_10.ndf') TO FILEGROUP fg_mergethis_10;
GO

Now Create a Partition Function, Partition Scheme, and a Table

Now we create a partition function named pf_merge_this, which specifies that we’re partitioning by a column with the date type. We have nine boundary points– we’re partitioning by every one day. We’re using RANGE LEFT in this example.

Our script then creates a partition scheme that maps each partition to a single filegroup. This is also a choice. The more filegroups you decide to use, the more complex your code becomes if you’re splitting and merging boundary points.

use mergethis;
GO

CREATE PARTITION FUNCTION pf_mergethis (date)
    AS RANGE LEFT
    FOR VALUES ( 
        '2016-01-01', '2016-02-01', '2016-03-01', '2016-04-01', '2016-05-01', '2016-06-01', '2016-07-01', '2016-08-01', '2016-09-01');
GO

CREATE PARTITION SCHEME ps_mergethis
    AS PARTITION pf_mergethis
    TO ([fg_mergethis_1],[fg_mergethis_2],[fg_mergethis_3],[fg_mergethis_4],[fg_mergethis_5],[fg_mergethis_6],[fg_mergethis_7],[fg_mergethis_8],[fg_mergethis_9],[fg_mergethis_10]);
GO

CREATE TABLE dbo.mergetest (
    mypartitioningcolumn date not null,
    makemeunique int identity (1,1),
    stuffnthings char(10) default ('stuff'),
    thingsnstuff char(10) default ('things')
) on ps_mergethis(mypartitioningcolumn);
GO

CREATE UNIQUE CLUSTERED INDEX cx_mergetest on dbo.mergetest(mypartitioningcolumn, makemeunique);
GO

insert dbo.mergetest (mypartitioningcolumn) values ('2016-02-01');
GO 20
insert dbo.mergetest (mypartitioningcolumn) values ('2016-03-01');
GO 30
insert dbo.mergetest (mypartitioningcolumn) values ('2016-04-01');
GO 40
insert dbo.mergetest (mypartitioningcolumn) values ('2016-05-01');
GO 50
insert dbo.mergetest (mypartitioningcolumn) values ('2016-06-01');
GO 60
insert dbo.mergetest (mypartitioningcolumn) values ('2016-07-01');
GO 70

Where’s All Our Data? Let’s Query the Metadata

We inserted 20 rows for February 1, 30 rows for March 1, 40 rows for April 1, and so on. We can query how this is mapped to our filegroups:

SELECT
    fg.name as fg_name,
    pf.name as partition_function_name,
    ps.name as partition_scheme_name,
    pf.boundary_value_on_right,
    p.partition_number,
    prv.value as boundary_point,
    so.name as table_name,
    si.name as index_name,
    stat.row_count,
    stat.reserved_page_count * 8./1024./1024. as reserved_gb
FROM sys.indexes as si 
JOIN sys.objects as so on si.object_id = so.object_id
JOIN sys.partitions as p on 
    si.object_id=p.object_id 
    and si.index_id=p.index_id
JOIN sys.data_spaces as ds on si.data_space_id = ds.data_space_id
JOIN sys.partition_schemes as ps on si.data_space_id=ps.data_space_id
JOIN sys.partition_functions AS pf on ps.function_id=pf.function_id
LEFT JOIN sys.allocation_units as au on p.hobt_id=au.container_id
LEFT JOIN sys.filegroups as fg on au.data_space_id=fg.data_space_id
LEFT JOIN sys.dm_db_partition_stats as stat on stat.object_id=p.object_id
    and stat.index_id=p.index_id
    and stat.index_id=p.index_id and stat.partition_id=p.partition_id
    and stat.partition_number=p.partition_number
LEFT JOIN sys.partition_range_values as prv on prv.function_id=pf.function_id
    and p.partition_number =  prv.boundary_id + pf.boundary_value_on_right 
WHERE ps.name='ps_mergethis'
GO

partitioning-filegroup-mapping-partition_number_1_empty

This all looks in order. We have a boundary point for January 1, but we didn’t insert any rows for it. It’s mapped to the fg_mergethis_1 filegroup and is currently partition_number 1.

And Now We Merge

The question I got was about merging, so let’s merge! We’re going to get rid of the January 1 Boundary Point:

ALTER PARTITION FUNCTION pf_mergethis () MERGE RANGE ('2016-01-01');
GO

Blammo, it’s gone!

Rerunning our query for the metadata above, here’s what the table looks like now:

partitioning-filegroup-mapping-partition_number_1_after_merge

We blew away a partition — the partition mapped to fg_mergethis_1 is now totally gone. Partition_number 1 has been reassigned to the partition with a boundary point of February 1.

But data didn’t move when we merged the partitions in this case. There were always 20 rows in the fg_mergethis_2 filegroup.

Partition_number is a logical construct. When it changes you don’t necessarily have a data movement.

Data Movement Matters — It Can Be Seriously Slow

It’s completely valid to be concerned about data movement. “Data movement” is just what it sounds like: data being physically moved from one partition to another. It’s fully logged and can cause a huge performance hit. If you’re planning out partitioning, you need to carefully:

Additional Thought: Do You Even Need to Merge, Bro?

When you’re designing partitioning, it’s always worth thinking about whether a rotating log design could work for you. Avoiding merge and split makes life easier in the long run if possible, so don’t forget this architecture when you’re in the design phase.

Previous Post
Lost in Performance Troubleshooting – Kendra’s Perf Triage for SQL Server (Dear SQL DBA)
Next Post
Next Door to Derpton – When Your Fellow DBA is a Danger to Databases (Dear SQL DBA)

Related Posts

3 Comments. Leave new

  • Bodhi Densmore
    June 7, 2016 12:48 pm

    CREATE UNIQUE CLUSTERED INDEX cx_mergetest on dbo.mergetest(mypartitioningcolumn, makemeunique);

    The unique index does not have to have the partitioning column as the first key-column.
    Is it not better to put the most unique column first when defining the clustering index?

    Reply
    • It will vary depending on query patterns and the partitioning setup.

      In this case, I’ve got a date granularity and just one day per partition, so I agree that putting the partitioning column first doesn’t seem to add much value — if someone’s querying just by the partitioning column alone and no other columns, they should be able to get partition elimination even if makemeunique is first. So yes, I see your point!

      For other data types where it’s more granular and you might query by the partitioning column and have a range of values in the partition, being able to seek on that column could be extremely valuable if that’s frequently done without other predicates.

      For that reason, I wouldn’t say that it’s a general rule to put the more unique column first. I would always look at the query patterns and granularity.

      Reply
  • […] #: This can be useful in some queries. Remember that partition numbers are reassigned when you modify your partition function […]

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Menu