I was looking at the ever-popular (but strangely absent this summer
) Paul Thomsen's blog recently and I noticed this article entitled "Reporting on client version for multiple sites". His code does some cool stuff so I did what normal admins do and swiped his code for my own purposes. Then I did what any hard-core geek would do, I tried to think if there was any way I can improve on the code. After all, most really great products started out as good ideas that people have built upon over time. When I realized what he was doing, I thought I could come up with a way to get the desired results in a little different way and perhaps more efficiently. (I want to emphasise that Paul is one of the better engineers out there and my improving on his code is not meant to imply that he needs help with his SQL code, which he doesn't, it is intended to highlight that almost everything anyone writes, including everything I've ever written, can be improved upon and this gave me an opportunity to post about the SQL 2005 PIVOT statement and improve something at the same time).
So in dissecting the code, I noticed that there were a number of subqueries that all went back to the same table, I saw a NOT IN, which from a SQL-performance standpoint is something I try to shy away from if I can do it another way, and I noticed that really he was manually pivoting on the sitecode (pivoting is going from rows to columns, unpivoting is going from columns to rows). At Wells Fargo, we're using SQL 2005 so I figured this was a perfect opportunity to use the newly introduced PIVOT command. (Note: Don't try this on SQL 2000 or earlier as there isn't a PIVOT statement)
So here's the original code from Paul Thomsen (thanks Paul!) with our important site codes replacing his and I removed the case statement for the "v3 SP2 escrow" type stuff because it doesn't really mean anything more than the version number does to us:
SNIPPET #1SELECT client_version0 'version number',
(SELECT COUNT(DISTINCT name0)
FROM v_r_system sys2
JOIN v_ra_system_smsassignedsites ass2
ON sys2.resourceid = ass2.resourceid
AND sys2.client_version0 = sys1.client_version0
AND sms_assigned_sites0 = 'C20'
AND client0 = 1
AND obsolete0 = 0) 'C20',
(SELECT COUNT(DISTINCT name0)
FROM v_r_system sys2
JOIN v_ra_system_smsassignedsites ass2
ON sys2.resourceid = ass2.resourceid
AND sys2.client_version0 = sys1.client_version0
AND sms_assigned_sites0 = 'D20'
AND client0 = 1
AND obsolete0 = 0) 'D20',
(SELECT COUNT(DISTINCT name0)
FROM v_r_system sys2
JOIN v_ra_system_smsassignedsites ass2
ON sys2.resourceid = ass2.resourceid
AND sys2.client_version0 = sys1.client_version0
AND sms_assigned_sites0 = 'M20'
AND client0 = 1
AND obsolete0 = 0) 'M20',
(SELECT COUNT(DISTINCT name0)
FROM v_r_system sys2
JOIN v_ra_system_smsassignedsites ass2
ON sys2.resourceid = ass2.resourceid
AND sys2.client_version0 = sys1.client_version0
AND sms_assigned_sites0 = 'R20'
AND client0 = 1
AND obsolete0 = 0) 'R20',
(SELECT COUNT(DISTINCT name0)
FROM v_r_system sys2
JOIN v_ra_system_smsassignedsites ass2
ON sys2.resourceid = ass2.resourceid
AND sys2.client_version0 = sys1.client_version0
AND sms_assigned_sites0 = 'S33'
AND client0 = 1
AND obsolete0 = 0) 'S33',
(SELECT COUNT(DISTINCT name0)
FROM v_r_system sys2
JOIN v_ra_system_smsassignedsites ass2
ON sys2.resourceid = ass2.resourceid
AND sys2.client_version0 = sys1.client_version0
AND sms_assigned_sites0 = 'T20'
AND client0 = 1
AND obsolete0 = 0) 'T20',
(SELECT COUNT(DISTINCT name0)
FROM v_r_system sys2
JOIN v_ra_system_smsassignedsites ass2
ON sys2.resourceid = ass2.resourceid
AND sys2.client_version0 = sys1.client_version0
AND sms_assigned_sites0 NOT IN ('C20','D20','M20','R20','S33','T20')
AND client0 = 1
AND obsolete0 = 0) 'OTHER',
COUNT(DISTINCT name0) 'TOTAL'
FROM v_r_system sys1
JOIN v_ra_system_smsassignedsites ass1
ON sys1.resourceid = ass1.resourceid
WHERE client0 = 1
AND obsolete0 = 0
GROUP BY sys1.client_version0
ORDER BY 2 DESC
SAMPLE OUTPUT #1

This seemed to work fine and gives us a pretty good representation of what sites have what versions of the SMS client. But I wanted to replace all of those subquery SELECT statements from the code, I wanted to include the schema/owner names when addressing tables/views (see this article for more information) and I wanted to make sure the totals were going to add up properly every time (there were some rare cases where if a computername is assigned to multiple sites it wasn't totalling properly - someone wanna 'splain me how there can be multiple assignments?).
So to improve it, let's get down to the guts of it. We're essentially pulling the following data:
- Distinct computer names (for getting counts)
- SMS Version Numbers
- Assigned site codes
Let's write some SQL to pull that info (note, when addressing the views I use dbo. in front of them. This is for performance reasons. Again, see this article for more information).
SNIPPET #2
SELECT DISTINCT sys.name0,
sys.client_version0,
sites.sms_assigned_sites0
FROM dbo.v_ra_system_smsassignedsites AS sites
JOIN dbo.v_r_system AS sys
ON sites.resourceid = sys.resourceid
WHERE sys.client0 = 1
AND sys.obsolete0 = 0
SAMPLE OUTPUT #2

OK, this is the data we want, and if we were going to just count the number of distinct computernames per sitecode on the row, we'd just perform a group by right in this code for the assigned sites and the client version, but what we want to do instead is move the M20, D20, T20, etc. site codes over to column headers. What we need for that is a PIVOT.
So what we need to do for the PIVOT is turn our little query above into a derived table so the PIVOT statement can aggregate the data and move the site codes to columns. Then we just select from the table output of the PIVOT and we've got gold. That looks something like this:
SNIPPET #3
SELECT *
FROM
(SELECT DISTINCT sys.name0,
sys.client_version0,
sites.sms_assigned_sites0
FROM dbo.v_ra_system_smsassignedsites
AS sites
JOIN
dbo.v_r_system AS
sys
ON
sites.resourceid =
sys.resourceid
WHERE sys.client0 =
1
AND
sys.obsolete0 = 0)
AS sites
PIVOT
(COUNT(sites.name0)
FOR sites.sms_assigned_sites0 IN ( [C20],[D20],[M20],[R20],[S33],[T20])) as Pvt
Now, to explain this a little better:
We took the code from snippet #2 above and made a derived table out of it, aliased as "SITES". This alias and the output of the subquery will only be used internally by the PIVOT statement, so don't try to access it from anywhere else.
Then there's the PIVOT keyword, which I've heard is like saying " GROUP BY * " (even though that little gem doesn't exist).
Next there's the block that tells PIVOT what to do:
And lastly there's the SELECT statement around the whole thing.
NOTE: The block of code under the PIVOT is using sites.name0 not sys.name0. Remember I said we're aliasing the output from the derived table SITES and this alias is only consumed internal to the PIVOT statement.
Anyway, what the code for snippet #3 is pretty much saying is:
"Pull all of the non-obsolete SMS client names, client versions and assigned site codes and put into derived table SITES to be used by PIVOT,
Create columns named C20, D20, M20, R20, S33 and T20 and append them to the right of the columns pulled from that SITES derived table,
Using only rows with same site code as in the PIVOT's IN clause, count each computername from SITES grouped by assigned site code,
Count each computername from SITES grouped by SMS version as well,
Pivot the numbers into the appropriate columns based on the site code,
Reference the table output of all of this as 'Pvt',
Select all columns and rows from Pvt "
SAMPLE OUTPUT #3

Great! Now we've got the count of unique computernames for each client version grouped into columns for each site code we named in the PIVOT's IN clause. So, we're almost done. We've still got to rename "client_version0" to something more pretty, we have to include an OTHER column for any other site codes we didn't mention, we need a TOTALS column, and we have to avoid SELECT * for performance reasons.
So let's first add the OTHER column by changing the SITES derived table (in snippet #3 above) to categorize all site codes other than our standard 6 codes to OTHER
SNIPPET #4
SELECT DISTINCT sys.name0,
sys.client_version0,
CASE WHEN sites.sms_assigned_sites0 IN ('C20','D20','M20','R20','S33','T20') THEN sites.sms_assigned_sites0 --RETURN SITE CODE IF THEY’RE OURS
ELSE 'OTHER' --FOR ALL OTHER SITE CODES RETURN "OTHER"
END AS AssignedSites
FROM dbo.v_ra_system_smsassignedsites AS sites
JOIN dbo.v_r_system AS sys
ON sites.resourceid = sys.resourceid
WHERE sys.client0 = 1
AND sys.obsolete0 = 0
Now we have to include that column in the PIVOT's IN clause:
SNIPPET #5
PIVOT
(COUNT(sites.name0)
FOR sites.AssignedSites IN ( [C20],[D20],[M20],[R20],[S33],[T20],[OTHER] ) ) AS Pvt
Now let's replace the SELECT * with specifically named columns, including the new [OTHER] column that will come from the PIVOT, a better alias for the client_version0 column, and a TOTAL column
SNIPPET #6 (FINAL PRODUCT)
SELECT pvt.client_version0 AS [Version Number], --NEW ALIAS FOR client_version0
[C20],
[D20],
[M20],
[R20],
[S33],
[T20],
[OTHER], --HERE'S THE OTHER
[C20]+[D20]+[M20]+[R20]+[S33]+[T20]+[OTHER] AS TOTAL --HERE'S THE TOTAL, JUST ADD THE FIELDS WITH MATH
FROM (SELECT DISTINCT sys.name0,
sys.client_version0,
CASE WHEN sites.sms_assigned_sites0 IN ('C20','D20','M20','R20','S33','T20') THEN sites.sms_assigned_sites0
ELSE 'OTHER'
END AS AssignedSites
FROM dbo.v_ra_system_smsassignedsites AS sites
JOIN dbo.v_r_system AS sys
ON sites.resourceid = sys.resourceid
WHERE sys.client0 = 1
AND sys.obsolete0 = 0) AS sites
PIVOT
(COUNT(sites.name0)
FOR sites.AssignedSites IN ( [C20],[D20],[M20],[R20],[S33],[T20],[OTHER] ) ) AS Pvt
ORDER BY pvt.client_version0 DESC
So there you have it! Paul had something pretty cool started, but now we've made it sexy with PIVOT and benchmarked it and in our environment the performance & I/O gains are worth it too.
Original Pivot
COST 47 9
EXECUTION TIME 3700ms 1300ms
Whew, thanks for sitting through that one!
Number2 (John Nelson)
MyITForum - Forum Posts
MyITForum - Blog

UPDATED: Oh, one last note, notice in snippet #6 we're using "pvt.client_version0". That's because we're selecting all that stuff from the table output from the PIVOT which we've aliased as Pvt.
Think of the flow of data this way: The data from snippet #4 (SITES) flows through to the pivot in snippet #5 (PVT) which gets selected by the outer select of snippet #6. So we're not selecting the client_version0 from SITES (that only gets used by Pvt) we're selecting it from PVT so we must address it that way.
Number2 (John Nelson)
MyITForum - Forum Posts
MyITForum - Blog
