SPLIT in a LEFT Partition Function: Where Does the Above-Boundary Data Go?

Whered I put that data squirrel

Table partitioning seems simple, but there’s a lot of complexity in designing and managing it if you decide to use filegroups and splitting.

When you first implement partitioning in this scenario, you decide where you’re going to keep “out of bound” data when you create your partition scheme. Be careful when you make that decision, because it may not be easy to change later.

Let me show you what I mean.

Meet our LEFT based partition function

Let’s say we want to partition a table that uses the datetime data type. We want to partition on a daily basis and assign each partition to its own filegroup.

We decided we want to use a “LEFT” based partition, meaning that the boundary points dividing our partitions are an “upper” bound and the boundary point sticks with the partition to the left of it.

We don’t want the data for midnight to be in a different partition, so we do this by making the boundary point the very last moment in the day for the DATETIME data type, like this:

CREATE PARTITION FUNCTION pf_splitthis (datetime)
    AS RANGE LEFT
    FOR VALUES ( 
        '2015-12-31 23:59:59.997', '2016-01-01 23:59:59.997', '2016-01-02 23:59:59.997', '2016-01-03 23:59:59.997');
GO

Yep, the precision of ye olde DATETIME datatype is such that .997 is the last moment in the day.

The partition scheme maps the data

After I create my partition function, I need to map where the data goes. I do that with this code:

CREATE PARTITION SCHEME ps_splitthis
    AS PARTITION pf_splitthis
    TO ([fg_splitthis_A],[fg_splitthis_B],[fg_splitthis_C],[fg_splitthis_D],[fg_outofbounds]);
GO

With my left-based partition function, here’s what the data assignment looks like:

Boundary Point Map

Note that there is no boundary point mapped to fg_outofbounds

Our highest boundary point is 2016-01-02 23:59:59.997, which is mapped to the filegroup to its LEFT– fg_splitthis_D.

I create a table on the partition scheme and insert some rows for January 1, January 2, and January 3.  Here’s what the metadata looks like after that.

filegroup_mapping_before_split

Click to View in a Larger Window

Now I SPLIT

I want to add a new partition, assigned to a new filegroup. I add the filegroup and file to my database.

To split, I need to tell SQL Server where to put the data for the next boundary point I define. I do this with ALTER PARTITION SCHEME and setting the NEXT USED property. Then I can add my boundary point:

ALTER PARTITION SCHEME ps_splitthis 
NEXT USED fg_splitthis_E;
GO

ALTER PARTITION FUNCTION pf_splitthis() SPLIT RANGE ('2016-01-04 23:59:59.997');
GO

After making this change, here’s what my filegroup mapping looks like now:

filegroup_mapping_after_split

Click to View in a Larger Window

Our new boundary point specifies that data for January 4 goes on fg_splitthis_E.

Data above the last boundary point (which is now the 4th) still goes to fg_outofbounds

I can continue to set NEXT_USED and split, but that won’t change the filegroup assignment for data above the last boundary point in my LEFT based partition function. fg_outofbounds is where any data above whatever is the highest value boundary point will go. Its partition_id is not changing when I SPLIT.

Why do people care?

This might be very confusing if I’d happened to have named that filegroup fg_splitthis_20160104, or something that looked like it was assigned to a specific date.

Can I change fg_outofbounds to a different filegroup?

Here’s where it gets awkward. The only ALTER option for PARTITION SCHEME is to set NEXT USED. And next used only sets which filegroup will accept a new partition created by a boundary point. That doesn’t help us.

The only way I know of to change where data above the highest boundary point in a LEFT based partition scheme goes is to do something like this in an outage window:

  • Make sure the partition using fg_outofbounds is empty. (If not, add the appropriate filegroup, etc)
  • Create a new partition scheme on your partition function with all the filegroups the same, except for fg_outofbounds (which is wherever you want it to be)
  • Create a new table that matches your old table (except for names) on the new partition scheme with matching indexes and constraints
  • Switch all the non-empty partitions to the new table
  • Make sure identities are properly seeded
  • Rename everything
  • Cross every finger and toe that I didn’t forget something on this list

That doesn’t sound like much fun. (If you know of a less painful way, let me know in the comments!)

And honestly, usually it’s not worth fixing. We probably shouldn’t be allowing data in outside the boundary points anyway.

But yeah, managing this isn’t so easy.

Takeaways: Be Careful When Designing Partitioning

It’s not actually so easy to change everything later. Be careful deciding how you want to use filegroups, and how many you want to manage. More info on making that decision is in my post here.

 

 

Previous Post
Learn Index Tuning at the PASS Summit in 2016!
Next Post
Index Usage Stats Insanity – the oddities of sys.dm db index usage stats (Dear SQL DBA)

Related Posts

6 Comments. Leave new

  • Ray Herring
    July 5, 2016 7:08 am

    Hi Kendra,
    The more I learn about the Table Partitioning feature the less I like it :).

    I have been restricted to Standard Edition for most of my career so I had to build a solution using Updateable Partitioned Views. It was a good bit of work to implement and took a lot of testing to get a solid, generic solution but overall I find it is much more flexible and controllable than what I see in Table Partitioning.

    My approach certainly has limitations (e.g., no “unbounded”) but it automagically adds new date based partitions (Day, Week, Month, Quarter, Annual) and drops off “old” partitions. It also supports dropping or archiving “old” partitions.

    We have migrated about a dozen of our most active transactional tables and found great throughput improvements since we don’t have to cleanup the old data with constantly running “delete” jobs.

    Reply
    • That’s awesome that partitioned views are working well for you!

      I actually quite like both approaches, I just find table partitioning to be more complicated than it seems.

      Reply
  • This approach isn’t necessarily better, but you could merge the target partition, then re-create it using the split after you use alter partition scheme to set the next file group you want to use. Then the new partition that you’re creating would end up in the new file group.

    Of course, this will mean that you’re moving data, potentially a lot of data, twice!!
    But you do avoid having to recreate a whole new table, indexes, and partition scheme with different names.
    You can preserve all names.

    Reply
  • I am using this method, but all on primary for simplicity. When I do the first split the data is loading into the partition as expected, but moving on to the second it loads to the same partition eventhough the integer value is equal to the boundary value. Any ideas?

    Reply
    • Hey Mark — I’m afraid I’m getting a bit lost in your question. Language is hard. Do you have a small amount of sample code you could share in a gist or sometime similar that outlines the problem you’re seeing?

      Reply
      • You can disregard. The tool we are using to load the data was caching the value I was using to partition, so it looked like I was changing the value for the batch in the ui, but it was it actually changing anything. Fun one to troubleshoot. Thanks.

        Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Menu