February 2009 - Posts

ConfigMgr Collection Query Rule Embedded Reference Checker

Abstract

I have written a tool to check the embedded references in ConfigMgr query-based collection rules.

 

Background

A very common practice in designing query rules for ConfigMgr collections is to use the SMS_FullCollectionMembership WMI class to effect the inclusion or exclusion of members of other collections into the collection for which the query rule is being written.  There are numerous examples and discussions about this practice on other blogs, but briefly, here's an involved scenario with the example query that meets the need:

  • You have some generic collections:
    • You have a collection that you design that uses the software inventory to include all systems on which .NET Framework 2.0 is installed.  Let's say this is collection ID LAB00001.
    • You have another collection that does something similar for MSXML 2.0, collection ID LAB00002.
    • You have a collection that contains any system with less than 1G of memory installed, collection ID LAB00003.
    • You have a collection that contains any system that is under the control of the EMEA desktop group, collection ID LAB00004.
  • You need to create a collection for a software roll-out that the EMEA desktop group can manage, but that will not allow them to advertise to anyone that does not meet the requirements.  You might have guessed it: .NET 2.0, MSXML 2.0, at least 1G of memory, and within their locus of control.
  • Instead of duplicating the effort in each collection, you want to re-use the existing collections.
  • This also is a best practice because if you decide to refine the queries that drive the membership of any of those collections, you would only need to update it in one place.
  • You consider using the "Limit Collection Membership" feature, but realize that (a) this can only link the new collections' membership to 1 other collection where here we need 4, and (b) you cannot use this to effect one of the requirements: excluding the membership of another collection.
  • To solve this issue, you create two collections.  To one collection (e.g. LAB00005) you grant the EMEA team rights.  To the other collection, you advertise the software and use the following query rule that uses the SMS_FullCollectionMembership WMI class:

SELECT * FROM SMS_R_System

WHERE

AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00001")

AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00002")

AND ResourceID NOT IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00003")

AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00004")

AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00005")

  • This produces a collection whose members have .NET 2.0, MSXML 2.0, are not in the <1G memory collection, are in the collection of systems under EMEA control, and are in the collection EMEA has been provided to control the distribution of this software release.
  • Note in all of these cases, you could have used the WMI class that is created specifically for each collection.  I don't think this is the common practice, but it achieves the same result.  This is important to point out, as the point comes up later in this post.
    • (i.e. the first sub-select could have been SELECT ResourceID FROM SMS_CM_RES_COLL_LAB00001).

 

The Problem

While this is a powerful technique and essentially something you can't do without in any active ConfigMgr environment, it is difficult to manage.  With a large number of collections that utilize this and the certain change in what is required for the types of collections I mentioned above as "generic," keeping track of these references can be difficult.  The purpose of a collection may change (e.g. .NET 2.0 becomes .NET 2.0 or higher, where something that required the collection when it was only .NET 2.0 is not compatible with >2.0)  The name may change with it.  A collection may be deleted and re-created for any number of reasons, which will not cause any alarm as embedded references are not tracked by the ConfigMgr Admin UI.

In spite of these issues, there is no facility for obtaining an overview of query rules with these references or, even more pressing, to decode the references to check that they still point to an existing, appropriate collection.

 

The Tool

I have written a tool that provides for both of these requirements: an overview and a reference checker.  The tool uses integrated SQL security to access the database views directly.  It is a console application that is invoked with two command line parameters: the server name and the SMS_??? database name.

The tool enumerates all existing site codes, collections, and query rules on that server.  It then uses regular expressions to detect probable collection references (i.e. any site code followed by 5 hex digits).  There is also a special case for the built-in collections, which do not correspond to a site code (starting instead with SMS) and do not conform to the strict standard for user-created collections (e.g. SMSDM001, SMS000GS, etc.)  Finally, it cross-references those collection IDs to the actual list of collections and includes them by ID and name for review.  Any collection ID that is found in a rule but not in the collections enumerated from the server is reported as "**** Unknown ****".  You can search the output for this to identify bad references.

 

Here's an example run:

CheckCollRef1

 

...and the output file...

CheckCollRef2

 

I think the tool and the output should be fairly self-explanatory from there.

Notes on the limitations of this tool:

  • We have a fairly large number of collections that utilize references at many client sites.  I've used this tool against all of them with no problems.  That having been said, there is still a possibility that some of you may have rules that fool the regular expressions I am using.  That highlights the limitation of pattern matching versus parsing, only the former of which this tool does.  I would be interested to see any instances of this, so please let me know.  The most likely candidate would be if your system or user naming convention is close (or identical) to the collection ID patterns in ConfigMgr.  If that were the case, collection query rules that reference such names could be mistaken for collection references.
  • This will not catch more generic references, like WHERE CollectionID LIKE "LAB0001%".  I'm guessing that if anyone is using these, it is a rare case.
  • It also will not catch references that are intrinsically broken, like an invalid Collection ID (e.g. LAB000X0 or LAB00000121).
  • It will not catch references to any Collection ID that does not correspond to any known site code.  Besides obvious typos, I can envision a valid scenario where this would be needed, but it would be very rare:
    • You have a child primary site 999.
    • You create a collection at that site (999?????).
    • You use that collection ID in a reference in a query rule of a collection built at a parent site.  This reference would return no ResourceIDs except when run on that particular child primary and any of its children.  This is where I say it's probably a very, very rare case.
    • You then delete site 999.
    • Any references to 999 collections are now broken, but since there's no longer a reference to site 999 in the database, the tool will not catch these.
  • It will not catch references to collections that are specified using the SMS_CM_RES_COLL_<CollectionID> classes.  I actually had this in the tool, but the underlying query that ConfigMgr generates uses these classes to effect the "Limit Collection Membership To" functionality.  This would have meant that the tool would not only identify query rules that had embedded references but also query rules for collections with the "Limit To" collection ID specified.  Since the latter is much more common and also double-checked by the Admin UI, I left their detection out of the tool.  As I mentioned above, I believe the most common practice is to use the SMS_FullCollectionMembership class, so this shouldn't pose much of an issue.
  • It will not catch references to other named objects such as packages, advertisements, etc.  While technically possible to use these in a WQL query, I'm guessing these, too, are rare.

All of these are probably rare.  If you have an appreciable number of instances of any of these exceptions, please let me know.  I'd be interested in working to evolve the tool to handle them accordingly.  Overall, though, I think you will find this to be a very useful tool.

 

 You can download build 15 of CheckCollRef here.

 

VJD/<><

Posted by vdipippo | with no comments
Filed under: ,

ConfigMgr Package Distribution Health Browser

Anyone that manages a ConfigMgr environment with any depth to its hierarchy usually spends a fair amount of time waiting for minijobs to replicate changes down the hierarchy.  Though not immediate, policy such as site control changes, advertisements, client agent configuration (inventory, discovery, etc.), and so forth do not usually keep you waiting too long to verify the success or failure of the replication down to the bottom of the tree.  One notable exception to this is package replication.  Especially for large packages, it is not uncommon for replication to take several hours, particularly if senders are configured with rate limits.  In an environment with a large number of packages and/or packages that update frequently, it can be quite an organizational challenge to find, check, and remedy replication issues on a regular basis.  This is even more true if you manage the ConfigMgr infrastructure but others create and maintain packages.  While the console is quite functional in checking the distribution status of packages in general, I wrote a tool that helps administrators zero in on package replication issues much more quickly.  First, it accesses the associated views in the database directly, which makes it much faster to get at this information than the console, which is a front-end to the ConfigMgr WMI provider.  Second, it offers some sorting and filtering options that highlight the replication issues much more efficiently than reviewing the total status for every package individually.

Here's the initial screen:

pdhb1

The File menu is simple enough.  It contains Connect and Exit.  If you click the Load button without using File | Connect, it also shows you the connect parameters (shown below).  The interface is also simple enough:

  • The left-hand pane (1) shows what you are investigating, controlled by the radio buttons (3).  If you investigate by Package (the default), it will show you packages on the left.  If you investigate by DP, it will show DPs on the left.
  • Clicking on anything in the left-hand pane shows the assignments that are in any of the selected assignment statuses in the right-hand pane (2), which are controlled by the three checkboxes on the right (5).
    • The default is to show only abnormal assignments.
    • If you are investigating by Package, clicking on a Package on the left shows all DPs that are in the selected statuses on the right.
    • If you are investigating by DP, clicking on a DP on the left shows any packages on that DP that are in the selected statuses on the right.
    • Normal and Abnormal are fairly self-explanatory, but "Missing" assignments may not be.  It is a quick way to see where packages are not assigned.  This is helpful for confirming whether the desired packages are on any new DPs.
  • The checkboxes on the left (4) determine the order of the list boxes.
    • The sort order applies notwithstanding the side on which the particular item is located.
    • Sorting DPs by Site Code is the default.  If this is un-checked, the DPs are sorted by server name.
    • Sorting Packages by Name is not the default.  The default is to sort packages by Package ID.
  • The final check box determines whether the tool pre-filters the left-hand list.  This is one of the central characteristics of this tool.
    • This is checked by default.
    • With this checked, only items in the left-hand list that have at least one assignment that would show in the right-hand list will be shown.
    • With this un-checked, all items in the left-hand list are shown.  Here, it is likely that you will click on something in the left-hand list and see nothing in the right-hand list.

So, to wrap-up, let me give you a couple of scenarios for the behavior of this tool:

  • With the defaults, loading from a DB will show packages in the left-hand list, sorted by Package ID, that have at least one DP that has an abnormal status.  Clicking on a Package ID would show all DPs to which that package is assigned with that abnormal status, sorted by Site Code.
  • If you change to Investigate by DP, it will show DPs in the left-hand list, sorted by Site Code, that have at least one package that has an abnormal status.  Clicking on a DP would show all Packages assigned to that DP with that abnormal status, sorted by Package ID.
  • If you switch back to Investigate by Package, un-check Show Abnormal... and check Show Missing..., you will see a list of packages that are not assigned to at least one DP.  This is usually normal, but with some packages you know need to be global, you can quickly see the issue and to which DP they need to be assigned.
  • With any of these, if you un-check Pre-Filter..., you will see all packages or all DPs in the left-hand list.
  • Etcetera, etcetera, etcetera!

Here's a snapshot of the tool in action at one of our customers.  I've blacked out the identifying information.

pdhb3

As promised, here is the connect box, which should be self-explanatory:

pdhb2

Please let me know what you think, if you encounter any issues with this tool, or have any suggestions.

You can download build 16 here.

Enjoy!

Vin/<><

Posted by vdipippo | 1 comment(s)
Filed under: ,

OpsMgr DW Timeouts and TableCount Utility

I recently fixed a problem at one of our clients that had a persistent problem with the DW write actions timing out.  After running the profiler and examining the underlying SQL, I found a few tables that were taking an inordinate amount of time to do simple operations.  This seemed to be a clear case of a fragmentation problem.  The biggest issue was that tables like ManagementPackVersion with 200 rows were taking over 3 minutes just to do a SELECT COUNT(*).  The remedy for these tables was to rebuild the indices using ALTER INDEX ALL ON <tableName> REBUILD.  Please check out the options and caveats regarding that command in the SQL documentation.

After fixing the tables that were causing the immediate issues, I wrote a small utility (TableCount.exe) that does a table count on all the tables in a database and reports the number of milliseconds that the operation takes on each.  This identified a few other tables that needed to have their indices rebuilt as well.  The utility takes two parameters: the server name and the database name.  It assumes integrated security at this point.  Here's some sample output (note the named instance of SQL):

C:\>TableCount OM07DB\TESTDWDB OperationsManagerDW

[Alert].[AlertResolutionState_135D9B096BF24191AD5E04D6C100DA4F]: 88738 row(s) retrieved in 15 ms
[dbo].[ManagementPackVersion]: 208 row(s) retrieved in 0 ms
[dbo].[StandardDatasetAggregationHistory]: 1589 row(s) retrieved in 0 ms
[Event].[Event_0A191F9853CD4927B76640AB8A8157F0]: 1801878 row(s) retrieved in 62 ms

...

C:\>TableCount OM07DB\TESTOPDB OperationsManager

[dbo].[DomainTableStatisticsUpdateHistory]: 1245 row(s) retrieved in 0 ms
[dbo].[MT_Computer]: 286 row(s) retrieved in 0 ms
[dbo].[MT_NetworkAdapter_0]: 929 row(s) retrieved in 0 ms
[dbo].[PerformanceData_27]: 1672410 row(s) retrieved in 312 ms
[dbo].[PerformanceData_28]: 1662302 row(s) retrieved in 218 ms
[dbo].[PerformanceData_29]: 1755413 row(s) retrieved in 594 ms
[dbo].[PerformanceData_30]: 1786974 row(s) retrieved in 359 ms
[dbo].[PerformanceSignatureData]: 193512 row(s) retrieved in 31 ms
[dbo].[PublisherMessages]: 1251885 row(s) retrieved in 156 ms
[dbo].[StateChangeEvent]: 338924 row(s) retrieved in 31 ms

...

There are other forensic techniques that you will find useful in this situation, especially related to querying the DMVs in SQL to see the actual issues, but those are well-documented elsewhere.  Also, the SELECT COUNT(*) operation is not sufficient for detecting every form of this issue.  None-the-less, this utility is pretty handy to get the counts and the count times in a single, quick snapshot.  It certainly found the problems in this customer's database, plus one other so far...

Feedback, as always, is most welcomed.

You can download build 2 of TableCount here.

VJD/<><

Posted by vdipippo | with no comments

Returning to the Blog!

In looking back at my blog, I can't believe it has been since July of 2008 that I have had a post.  I assure you that this is against my earnest desire to do so.  As with anything, I don't think I've successfully integrated blogging into my routine.  I suppose at 6 months that is an understatement!

That's about all I have to add to this particular post!  I will instead endeavor to post content from the work we do in System Center more frequently.

Thanks to all for the positive feedback we continue to get on the Lineage Explorer and the Workflow Primer module.

I hope to hear from you all again soon...

Vin/<><

Posted by vdipippo | with no comments
Filed under: