Sorting NULLs in ROLLUPs

01/07/2020

Have you ever tried to sort your data by columns with NULLs? SQL Server always puts them first when sorting ascending.

That's ok, you can kinda appreciate NULL is a lower value than zero or an empty string.

But what happens when you make use of the GROUP BY ROLLUP feature to include a total row? You want that total to be at the bottom, right?

Lets Create Some Data

                    
CREATE TABLE t_Purchases (
    ID              INT IDENTITY(1,1) NOT NULL,
    PurchaseDate    DATE,
    Description     NVARCHAR(1000),
    Value           DECIMAL(10,2)
);

INSERT INTO t_Purchases (PurchaseDate, Description, Value) VALUES ('2019-12-01', 'Cuddly Toy', 5.99);
INSERT INTO t_Purchases (PurchaseDate, Description, Value) VALUES ('2019-12-02', 'Big TV', 4000.00);
INSERT INTO t_Purchases (PurchaseDate, Description, Value) VALUES ('2019-12-03', 'Board Game', 9.99);
INSERT INTO t_Purchases (PurchaseDate, Description, Value) VALUES ('2019-12-03', 'Board Game', 19.99);
INSERT INTO t_Purchases (PurchaseDate, Description, Value) VALUES ('2019-12-04', 'Amazon Gift Card', 20.00);
INSERT INTO t_Purchases (PurchaseDate, Description, Value) VALUES ('2019-12-05', 'Socks', 9.99);
INSERT INTO t_Purchases (PurchaseDate, Description, Value) VALUES ('2019-12-06', 'Board Game', 29.99);

                

Note: Board Games were purchased on the same day and multiple days.

How much did we spend on Christmas this year?

                    
SELECT		 p.Description
		,SUM(p.Value)
FROM		t_Purchases p
GROUP BY ROLLUP (p.Description);
                

Ok, so our total row shows our ROLLUP, and the results are ordered ascending by default by the ROLLUP value 'p.Description'.

We seem to have lost a number of the Board Games rows, they've been GROUP'd into one row. We might have to use the ID value somewhere to keep our list complete.

What if we want to order by the date we made those purchases, leaving the Total row at the bottom?

Sorting by Date

Let's add an ORDER BY clause:

                    
SELECT			 p.Description
			,SUM(p.Value) AS Value
FROM			t_Purchases p
GROUP BY ROLLUP (p.Description)
ORDER BY		p.PurchaseDate;
                

Nope, it's not that simple.

Msg 8127, Level 16, State 1, Line 18

Column "t_Purchases.PurchaseDate" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.

Simple, lets follow the instructions, right?

                    
SELECT		 p.Description
		,SUM(p.Value)
FROM		t_Purchases p
GROUP BY ROLLUP (p.Description, p.PurchaseDate)
ORDER BY	p.PurchaseDate
                

Wow, so much extra data!

Unfortunately, SQL will not allow you to sort ROLLUP data without it being in the GroupBy list. For this one, we have to wrap the sort around the original statement and filter out those extra rows.

Filtered View

                        
WITH LotsOfData AS (
	SELECT	 p.ID
		,p.PurchaseDate
		,p.Description
		,SUM(p.Value) as Value
	FROM		t_Purchases p
	GROUP BY ROLLUP (p.ID, p.PurchaseDate, p.Description)
)

SELECT		 l.PurchaseDate
		,l.Description
		,l.Value
FROM		LotsOfData l
WHERE		l.ID IS NULL 
OR		(l.Description IS NOT NULL AND l.PurchaseDate IS NOT NULL)
ORDER BY	(CASE WHEN l.PurchaseDate IS NULL THEN 1 ELSE 0 END)
		,l.PurchaseDate
                    

We have our data, yay!

But that code changed a lot there, lets take a look at the key elements.

Code that Makes it Work

The Original Select statement

We have to include the PurchaseDate in the ROLLUP because we need to include it in the SELECT list to be able to ORDER BY it later.

We have to include the ID in the ROLLUP to ensure each row is included as a unique item. Without this those Board Games would've been rolled up together.

If we added the PurchaseDate or ID in the GROUP BY clause, outside the ROLLUP, we'd get a subtotal for each Description and then PurchaseDate, but no final total.

Our CTE wrapper

We can wrap the original select statement in a Common Table Expression (CTE), sub-query or view. I chose a CTE here to keep things clear.

The WHERE clause

The WHERE clause here sorts filtering the extra rollups.

Our original SELECT statement returns ROLLUP lines for each dimension. Here, we only want one final total row.

'WHERE l.ID IS NULL' will include our Total row.

'WHERE (l.Description IS NOT NULL AND l.PurchaseDate IS NOT NULL)' will remove all of the extra ROLLUPs that we're not interested in.

And finally, our sort

The magic of sorting NULLs.

'ORDER BY l.PurchaseDate' would order everything as we want, except it would put the NULL/Total row at the top.

'ORDER BY (CASE WHEN l.PurchaseDate IS NULL THEN 1 ELSE 0 END)' adds an extra element to sort on. 1 if the PurchaseDate is NULL (it will be for our total row) or 0 if it's not.

SQL will sort by this extra field first, dropping the total row to the bottom.