Percentiles with APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC

How do you go about finding the median percentile of a data set? What if you need the top x percentile? Both the APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC functions can be used to solve these questions.

Let’s look at how we can use each and what makes them unique.

Comic Prices

Let’s set up a table of test data in a ComicSales table. Yes, SalePrice is just an INT rather than something more precise like DECIMAL. More on that later. For now, here’s our test data:

APPROX_PERCENTILE_CONT

Which function you choose may be determined by how precise a percentile you need.

Let’s start with APPROX_PERCENTILE_CONT to retrieve the median SalePrice by specifying 0.50 ( we must specify between 0.0 and 1.0) with the function. Keep in mind that with APPROX_PERCENTILE_CONT, the median price or any other percentile you’re identifying doesn’t necessarily have to be an actual price from the table:

SELECT APPROX_PERCENTILE_CONT(0.50) 
	WITHIN GROUP (ORDER BY SalePrice) AS CONT_Median
FROM ComicSales;
GO

With our test data, we see this as the median comes to an approximate 5.5, which isn’t a SalePrice value that exists in our ComicSales table.

APPROX_PERCENTILE_DISC

Unlike APPROX_PERCENTILE_CONT, APPROX_PERCENTILE_DISC will always return a SalePrice that exists in our ComicSales table. If we use APPROX_PERCENTILE_DISC to find the median by once again specifying 0.50, we’ll get 5. This is the actual price for the “Incredible Hulk” and “Green Lantern” comics.

SELECT APPROX_PERCENTILE_DISC(0.50)
	WITHIN GROUP (ORDER BY SalePrice) AS DISC_Median
FROM ComicSales;
GO

Why Create the SalePrice as INT?

For this example, I went with INT because, as documented in the APPROX_PERCENTILE_DISC Microsoft article,

“Supported approximate numeric types are float and real. Decimal and float data types are not supported.”

I briefly considered switching to FLOAT for my examples. But, considering the imprecision issues of using FLOAT for money which could lead to problems in real-world scenarios, I thought it best to keep it simple with INT.

Thanks for reading!

One thought on “Percentiles with APPROX_PERCENTILE_CONT and APPROX_PERCENTILE_DISC”

Leave a comment