DISCLAIMER!
OK, BIG disclaimer here. This procedure is neither endorsed, nor supported by Microsoft and for small to middle size SMS deployments might not help at all! Plus this advice is offered as is with no warranty or suitability for any purpose. Make sure you have backups! Please see your doctor before starting any new exercize regimen. Not FDIC insured. Etc. etc. Basically, know what you're doing and don't sue me if it doesn't work. (It's sad that we're at a point in society where I have to write that for fear of what might happen).
Now, that scary warning out of the way, being unsupported by MS doesn't necessarily mean it's a bad idea. They have the very difficult task of making products that meet the needs of small, medium, large and gynormous businesses as best they can all at the same time. So if I've gotta do a little poking around myself to meet the needs of a gynormous shop, I'm fine with that. I don't believe that anything I'm going to suggest is actually dangerous or risky, (after all, I'm only talking about reordering the data) but it requires knowing that if push comes to shove, it may limit your support for certain things. Makes you want to read on, doesn't it?
THE PROBLEM
At Wells Fargo, we're one of the larger SMS shops out there. And while our size ranking doesn't really matter to me, the fact is that we have a ton of SMS clients, and along with that comes an SMS database with many tables that have millions upon millions of records (I saw a table that had over 260M rows). Even though we are very strict about what inventory items we turn on and the frequency of updates, we still have ~156GB of SMS data on our central reporting site (and climbing). I don't say this to show off or brag or prove that our Schwartz is bigger (yeah, stupid Spaceballs reference for those who didn't know), and there are plenty of other shops our size or bigger, but rather, I say this to illustrate the scale in which we're working in the hopes that someone else may recognize the same scale at their workplace and evaluate if they should maybe undertake something like this.
Now, SMS can handle deployments this big just fine, it doesn't cause problems with the application. In fact, compared to Altiris, Tivoli, Opsware, LanDesk and Zenworks, our experience is that SMS handles very large deployments the best of all of them. It does however require some special love when querying data and writing reports. If your queries are poorly written, they'll take forever and you'll probably tip over the reporting server before it ever finishes (I've seen it no less than 8 times in the last 6 months and I was responsible for one of those).
Here's the issue SMS has when you get this big: The SMS DB in general is built with a focus on the computer, more specifically the resourceID. When inserts, updates and deletes happen for normal SMS operations, the resourceID is usually the primary key for the tables and the table data is organized (via clustered indexes generated from primary key constraints) via the resourceID. That's great for normal SMS operations where inventory and discovery data comes in on a per-computer basis. But think about when you write a query or report against something like the Add/Remove programs view dbo.v_GS_Add_Remove_Programs. Typically, you write a report to pull back all of the machines that have (or don't have) a specific program, like "Microsoft Office Enterprise 2007". So if the table is sorted by resourceID, then the list of machines that have office 2007 may be physically stored anywhere in the millions and millions of rows in the DB and the SQL engine has to scan the entire table (or the entire index) for machines that match your criteria. By default the dbo.Add_Remove_Programs_DATA table only has 1 index, and it's a clustered index ordered by machineID and instanceKey columns (which in the view is the same as the resourceID and groupID), the SQL engine still has to pull the data from points all over the disk, and if you're normal, the disk IO is your biggest bottleneck in the SMS infrastructure.
So...one day that got me thinking: Since we query the view dbo.v_GS_Add_Remove_Programs by DisplayName0, Version0, and Publisher0, what if the underlying table was organized by those three columns in the clustered index?
THE SOLUTION (cheaters can find the code
HERE)
For the sake of this article, I'm using the Add/Remove programs table as the example. So let's open SQL Enterprise Manager and look at the indexes of dbo.Add_Remove_Programs_DATA:

FIGURE 1
Well, from figure 1 above we can see there's only the 1 clustered index. Remember, clustered index means the rows have been physically re-ordered based on certain columns like a phonebook. So let's double-click on it to see what columns are in the index.

FIGURE 2
Like I thought, MachineID (same as resourceID in the views) and InstanceKey (same as the GroupID). Pretty much all of the SMS tables are like this. Primary key indexes on 2 columns - which I'm sure works well with SMS internal operations which are resourceID based, but not so well when you've got 10s of millions of rows and you want to query just the rows containing a single product.
SIDE NOTE FOR THE GEEKS: Notice how in figure 2 everything's all greyed out? You can't change anything on these clustered indexes because it was created as part of a primary key constraint. What I mean by that is, when you have a new table, with no indexes or constraints of any kind, and first go to create a primary key constraint, SQL will create the constraint like you asked but what you may NOT know is that if there are no clustered indexes on the table yet, and if you don't manually define the constraint as "NONCLUSTERED", SQL will automatically make the constraint a CLUSTERED constraint and thus create a clustered index on the fields specified as the primary key constraint as well. Now the index is locked up so you can't modify it without first dropping the primary key constraint (not something you want to do during busy production hours) and making a new one.
Back to FIGURE 2. To give you a feel for what this single clustered index on MachineID and InstanceKey means for your queries, let me quickly show you the difference in the execution plan when you query on a field IN a clustered index, and NOT in a clustered index.
From SQL Enterprise manager, using the following 2 SQL queries in one query window, hit CTRL+L to get an execution plan:
SNIPPET #1
select *
from dbo.v_R_System
where Name0 = 'COMPUTER'
select *
from dbo.v_R_System
where Netbios_Name0 = 'COMPUTER'

FIGURE 3
Now this is a VERY simple example, and both of these queries are very small and fast and don't use many resources, but it does illustrate my point.
The Name0 field from v_R_System is NOT indexed, but the Netbios_Name0 field IS indexed. So, when you have a filter in your WHERE clause that looks for Name0 = 'COMPUTER', you'll notice that it performs a Clustered Index Scan, which means the SQL engine has to look at the Name0 field for EVERY ROW in the index (which, because it's a clustered index and thus the table is just re-ordered, we're talking about every row in the table) to see if it matches. However, when you use Netbios_Name0, which IS indexed, you'll notice it does an index seek (seek=fast) and key lookup (key lookup=fast), meaning basically that it knows exactly where in the table the data is, and can immediately go get it. The difference here is the query that's using a non-indexed field in the WHERE clause has 451 times the cost and I'm sure an increased disk/processing load as well (unquantified here).
The same principle is happening when we run a report or ad-hoc query that looks for all computers with a particular piece of software--the data in the add/remove programs (ARP) tables is ordered by resourceID, not by the display name, version or publisher so when you have millions of records to search through, SQL has a tough time getting it all quickly, and if your reports run slowly it reflects poorly on you and/or SMS even though it's not your fault.
At first I thought perhaps it would be worth our while to copy the data out of this table to my own table, denormalize the data, and get some good indexes on it, but when you're talking about a DB that's already 156GB and is super busy with normal daily operations, that didn't make sense. After all, we don't have that much space left on the server to be copying gigs and gigs of duplicate data without threatening full disks and backup failures, and then there's the issue of replication...how often do you copy data over to that table, and how long does it take? Does it lock the ARP tables while it's doing it? After going down that road, I realized our DB is too big, too busy and it was not worth it.
So the only other reasonable solution was to reorder the data so it's DisplayName-based not resourceID based.
In order to do that we need to do the following:
- Remove the clustered primary key constraint - the clustered index will disappear too - we can't get a new clustered index on there until this is gone
- Quickly recreate the primary key constraint as nonclustered so that no NULLs or duplicates can be introduced into key fields, but it isn't clustered
- Recreate a clustered index the way we want the data ordered.
Here's the code for all of that (remember, Microsoft has not done this, tested this, or recommended this, but I still believe it's a good idea for this table and for our size)
SNIPPET #2 (FINAL PRODUCT)
-- DROP CLUSTERED
PRIMARY KEY CONSTRAINT
ALTER TABLE [dbo].[Add_Remove_Programs_DATA]
DROP CONSTRAINT [Add_Remove_Programs_DATA_PK]
WITH (ONLINE = OFF) -- ON IS ONLY FOR
ENTERPRISE VERSION OF SQL
-- RE-CREATE
PRIMARY KEY CONSTRAINT WITHOUT CLUSTERING
ALTER TABLE [dbo].[Add_Remove_Programs_DATA]
ADD CONSTRAINT [Add_Remove_Programs_DATA_PK]
PRIMARY KEY NONCLUSTERED
(
[MachineID] ASC,
[InstanceKey] ASC
)WITH (PAD_INDEX = ON,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB
= ON,
IGNORE_DUP_KEY
= OFF,
ONLINE
= OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
FILLFACTOR
= 80) ON [PRIMARY]
-- CREATE NEW
CLUSTERED INDEX OUR WAY
CREATE UNIQUE CLUSTERED INDEX [_SDE_ARP_IDX] ON
[dbo].[Add_Remove_Programs_DATA]
(
[DisplayName00] ASC,
[Version00] ASC,
[Publisher00] ASC,
[MachineID] ASC,
[InstanceKey] ASC
)WITH (PAD_INDEX = ON,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB
= ON,
IGNORE_DUP_KEY
= OFF,
DROP_EXISTING
= OFF,
ONLINE
= OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
FILLFACTOR
= 80) ON [PRIMARY]
SUMMARY & NEXT STEPS
So there you have it, the data should now be reorganized by the DisplayName, Version, Publisher fields so that querying by them should be much faster. Try the following before and after and hit CTRL+L for execution plan.
select resourceID
from dbo.v_GS_Add_Remove_Programs
where DisplayName0 = 'Microsoft Office
Enterprise 2007'
Next, you may want to explore doing this on the Add_Remove_Programs_64_DATA table as well.
WARNING: Right now, the ARP/ARP64 tables are the only tables we've attempted this on with great success. If you attempt this on other tables, make sure you understand the consequences. For example, if you try this on the new INSTALLED_SOFTWARE_DATA tables, you might be disappointed by the results because of the way that table is joined to the LU_* tables in the views. You might make it faster to query the table directly by product name, but when the view joins to the LU_* tables based on resourceID, it'll now take LONGER to query. It's a lot more complicated than just re-ordering that table and I haven't taken the time to see if there's a good way to do that one yet. That's not to say that it's a bad idea, it just needs more thought. Same with any of the tables. Know the data, know how the data is used, know how often the data is updated, and know that there's a problem first before you just try to FIX something by reordering the data.
As always, I hope this was helpful and I welcome comments or questions.
Number2 (John Nelson)
MyITForum - Forum Posts
MyITForum - Blog
