March 9, 2021

T-SQL Tuesday #136: Your Favorite or Least Favorite Data Type

This month’s topic for T-SQL Tuesday comes from Brent Ozar and asks to discuss your favorite or least favorite data type. The full topic invitation can be found by clicking the image below:

Least of All

My mind quickly went to the uniqueidentifier (GUID) data type. It may not be fair but I think of it as my least favorite. The reasoning is more of a pet peeve. Most of the time there’s nothing wrong with the uniqueidentifier data type; however, it makes me cringe if it is the clustering key on a table when an INT would do just fine because it ends up wasting disk space.

It’s Not All Bad

A quick disclaimer: I know it’s 2021. Hardware is getting better and cheaper every year. Problems that may have been more concerning five or ten years ago may not be as noticeable today. I doubt that fragmentation from using a uniqueidentifier as the clustering key is going to single-handedly cripple your SQL server. I understand there are times when uniqueidentifier may even be the best option. For example, if you know you will be merging table records from multiple databases into a single database table then uniqueidentifier might be the answer so that you don’t have to worry about handling duplicate INT values.

With that said, if it’s not a requirement then I would avoid uniqueidentifier and use INT instead. It’s all about using the right type at the right time. You wouldn’t use VARCHAR(MAX) for a middle initial of a name or use BIGINT for an area code, right?

GUID vs. INT

I created two tables to observe the size difference. One table had a uniqueidentifier primary key as the clustering key and one had an INT primary key as the clustering key. I inserted 100000 rows into each and compared. The table with an INT Id is about half the size:

SELECT t.name AS 'Table'
	,p.rows AS 'Rows'
	,CAST(ROUND(((SUM(a.total_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS 'Total_MB'
FROM sys.tables t
INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id
GROUP BY t.Name, p.Rows;
GO

Also, since the clustering key will be added to any nonclustered indexes as the lookup value, we’ll see higher page counts and fragmentation on any added indexes.

For a smaller table in a single database this isn’t as big of a deal. Where this can be a pain point is when uniqueidentifier is on a large table, for multiple tables in a database, and for multiple databases on a server.

If a table is created without a true need for uniqueidentifier and INT would be sufficient, disk space is being needlessly wasted. Maybe this isn’t considered as important for your organization. More disk space can be allocated for data files and you can move onto the next task. But what about backups and restores? Backups are going to be larger which means backups and restores are going to take more time to complete. On a smaller scale this might not be as concerning but as your databases grows it could become more costly.

Thanks for reading!

7 thoughts on “T-SQL Tuesday #136: Your Favorite or Least Favorite Data Type”

Pingback: Fragmentation and GUIDs – Curated SQL
Jeff Moden says:

March 13, 2021 at 3:23 pm

Thanks for taking the time to write this. I appreciate anyone that will “step up to the plate” to do so.

I do have a couple of interesting comments, though…

While I agree that a GUID is 4 times the size of an INT and 2 times the size of a BIGINT and also agree with a several other points concerning their length including but not limited to the fact that the key column(s) of the Clustered Index will auto-magically be bolted on to at least the Leaf Level of Non-Clustered Indexes, it may well be worth it. It turns out that Random GUID keyed indexes are the epitome of what most people actually expect a good index to behave as.

1. They easily allow for very high insert rates because they have no “Hot Spot” and Method 4 of the following MS Article actually covers that.
https://docs.microsoft.com/en-US/troubleshoot/sql/performance/resolve-pagelatch-ex-contention

2. The supposed fragmentation problem of Random GUIDs is actually a myth perpetuated by some really bad practices. Those bad practices are actually what most of the SQL Server world has adopted as a “Best Practice” for Index Maintenance. The biggest issue there is because a lot well meaning folks have made a serious mistake in interpreting what REORGANIZE actually does. That mistake is rather clearly explained in the paragraph that that contains the sentences of “Note that ALTER INDEX REORGANIZE cannot reduce page fullness. Instead, an index rebuild must be performed.” of the following MS Article.
https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-index-physical-stats-transact-sql?view=sql-server-ver15

In reality and if properly setup and maintained, you can literally insert MILLIONS of rows into a Random GUID Clustered Index with < 1% logical fragmentation, almost no page splits ("Good" or "Bad" ones) and it only gets better with time and scale.

I recently did a 2.5 hour presentation on Random GUIDs for Kevin Feasel's "SHOPTALK" on the subject, which can be found at the following YouTube link that explains it all including why the myth exists and demonstrates what happens if you make the mistake of using REORGANIZE on Random GUID keyed indexes. Unlike the presentation, the "Fix for Fragmentation" takes only seconds to implement.

LikeLike

Reply
1. CallihanData says:
  
  March 13, 2021 at 5:21 pm
  
  Thanks for the comments and article links! That makes sense where using GUID for high insert rates would be a solution for PAGELATCH_EX waits.
  
  I’m setting aside time tomorrow to watch the YouTube presentation and will be sure to bring my water-cooled helmet as the description recommends!
  
  LikeLike
  
  Reply
2. nbsteve says:
  
  March 15, 2021 at 11:17 am
  
  Haha, as I read this article I was thinking to myself “Wow, I just watched a Jeff Moden video that would have a thing or two to say about this!” And sure enough, I get to the bottom and see that Jeff Moden has a thing or two to say and links the same video. 🙂
  
  Thanks Chad for the article, and thanks Jeff for the video! I do recommend Jeff’s video, it’s lengthy but very enlightening!
  
  LikeLike
  
  Reply
  1. CallihanData says:
    
    March 15, 2021 at 12:44 pm
    
    Jeff’s video is definitely worth checking out. Thanks for the comment!
    
    LikeLike
Jeff Moden says:

March 15, 2021 at 1:01 pm

Thanks for the feedback and the recommendation, Steve.

LikeLike

Reply
Jeff Moden says:

March 15, 2021 at 1:03 pm

You, too, Chad. And, again, thank you for the article above.

LikeLike

Reply