This month’s topic for T-SQL Tuesday comes from Brent Ozar and asks to discuss your favorite or least favorite data type. The full topic invitation can be found by clicking the image below:
Least of All
My mind quickly went to the uniqueidentifier (GUID) data type. It may not be fair but I think of it as my least favorite. The reasoning is more of a pet peeve. Most of the time there’s nothing wrong with the uniqueidentifier data type; however, it makes me cringe if it is the clustering key on a table when an INT would do just fine because it ends up wasting disk space.
It’s Not All Bad
A quick disclaimer: I know it’s 2021. Hardware is getting better and cheaper every year. Problems that may have been more concerning five or ten years ago may not be as noticeable today. I doubt that fragmentation from using a uniqueidentifier as the clustering key is going to single-handedly cripple your SQL server. I understand there are times when uniqueidentifier may even be the best option. For example, if you know you will be merging table records from multiple databases into a single database table then uniqueidentifier might be the answer so that you don’t have to worry about handling duplicate INT values.
With that said, if it’s not a requirement then I would avoid uniqueidentifier and use INT instead. It’s all about using the right type at the right time. You wouldn’t use VARCHAR(MAX) for a middle initial of a name or use BIGINT for an area code, right?
GUID vs. INT
I created two tables to observe the size difference. One table had a uniqueidentifier primary key as the clustering key and one had an INT primary key as the clustering key. I inserted 100000 rows into each and compared. The table with an INT Id is about half the size:
SELECT t.name AS 'Table' ,p.rows AS 'Rows' ,CAST(ROUND(((SUM(a.total_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS 'Total_MB' FROM sys.tables t INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id GROUP BY t.Name, p.Rows; GO
Also, since the clustering key will be added to any nonclustered indexes as the lookup value, we’ll see higher page counts and fragmentation on any added indexes.
For a smaller table in a single database this isn’t as big of a deal. Where this can be a pain point is when uniqueidentifier is on a large table, for multiple tables in a database, and for multiple databases on a server.
If a table is created without a true need for uniqueidentifier and INT would be sufficient, disk space is being needlessly wasted. Maybe this isn’t considered as important for your organization. More disk space can be allocated for data files and you can move onto the next task. But what about backups and restores? Backups are going to be larger which means backups and restores are going to take more time to complete. On a smaller scale this might not be as concerning but as your databases grows it could become more costly.
Thanks for reading!