March 2008 - Posts
Summary: did you feel a disturbance in the computer management force this week? Some of us serving Microsoft IT felt it. A moment of silence is in order.
You might think I jest, but as a fellow techie you probably know that we develop a respect and even affection for hardware that sees a lot of history, doing a lot of important of work with us. It's much like we respect great software (OpenVMS anyone?) or great books{"VAX/VMS Internals and Data Structures" still does it for me, but Charles Petzold's "Programming Windows" is a close second). Similarly, we respect great (even if rarely famous) techies of all sorts (too numerous to even begin listing) and great tools.
Over the years any techie will spend longs hours with their main servers. Often in the middle of the night and/or under stressful circumstances. No matter how good the hardware, failures will occur, and we have to wrestle the situation under control. If anything, we're amazed such problems don't occur more often. In so doing, we come to respect our allies - the hardware. It's easy to spend more time with the server than our spouses.
Here at Microsoft IT we've had such a server, with sitecode "RDM" and server name "B11ITGSMS01". It has been the central site of our main hierarchy for many years.
One of my coworkers, Edward Bell, has written the following words in honor of RDM, which lost the last of its child sites (and thus clients) on Friday (March 28th, 2008). I'm pleased that he has allowed me to share those words with you:
"RDM is a battle tested warrior and should be retired with full military honors. The server has been in service longer than a number of the people working on SMS today have been at Microsoft.
Did you know…
• The RDM SMS central site server was placed into service around 2000 running SMS 2.0 with a pre-release version SP2.
• RDM replaced a previous SMS 2.0 server with site code GBL for Global.
• RDM has been upgraded to following SMS versions: SMS 2.0 SP2, SMS 2.0 SP3, SMS 2.0 SP4, SMS 2.0 SP5, SMS 2003, SMS 2003 SP1, SMS 2003 SP2, and SMS 2003 SP3.
• In 2002, RDM server hardware was an enterprise class SQL server with 8 CPUs and 8GB memory. I ordered and configured the server and it cost around $40,000 dollars. Yes, [even then it occasionally] still had SMS inbox backlogs.
• [At its peak] RDM managed more than a quarter of a MILLION (250,000) computers.
• RDM SQL database size was approximately 250GB.
• RDM has deployed millions of software package intances to Microsoft desktops.
Top 10 suggestions on what to do with RDM:
1. Declare RDM a god. Start a new religion. Rev Paul Thomsen will hold services Sunday from 10am to 12noon. We will be reading from the book of SMS 2003 Concepts, Planning & Deployment, Chapter 14, Upgrading to SMS.
2. Launch RDM on a Delta rocket into outer space. Maybe in a few hundred or thousand years, some alien civilization will encounter the server and marvel at our advanced technology.
3. Put RDM out to pasture. Charge a $100 stud fee to produce baby RDM offspring servers.
4. Donate the server to the Smithsonian Institute.
5. Use RDM as an the world's largest artificial heart.
6. Place RDM in the Microsoft Museum. Tourists can learn about system management [as it was done] in the good old days.
7. On Halloween, drive RDM around to geeks' homes. Yell 'trick or treat' and frighten them with old server technology.
8. Use RDM as a DVD player to view Mitch Groeneveld’s DVD office collection [which is huge].
9. Convert RDM to a slot machine and sell RDM to a casino in Las Vegas.
10. Display RDM outside of the Tuk5 Datacenter as modern art. Just remember, to power it down and unplug it.
Finally, we should retire the RDM Site code. No Microsoft SMS server can ever use the RDM site code again. This works for athletes. Why not servers?"
Another long-time colleague, Brian Wyne, shares those sentiments "And part of me dies with it. I will miss it forever. Goodbye RDM."
UPDATE: I almost forgot these words, from Mike Church, another of our MSIT colleagues (and who spent many years in the SMS Product Group):
"RDM we’ve loved you so
But now it’s time for you to go.
You’ve been with us for oh so long
To say good bye to some seems wrong.
But as they say all things must end
So here’s so long to you old friend.
It’s out to pasture now for you
We need to move to something new.
Some are glad to see you gone
Others wish you’d just hang on
But out the door, you got the boot
Get along it’s time to scoot.
As all us old farts will some day
It’s time to retire, you just can’t stay.
Don’t go mad, just go, get out
And as the door slams HOORAY we shout."
And the person that predates all of us as a Microsoft IT SMS administrator, Cutter Smith"
"Ah the memories indeed.
Seems so long ago I drew the Visio for the infrastructure wrote up the long Project plan and placed that fateful purchase order for good old RDM…."
Summary: we talk about client health a lot on this blog, but ultimately we all want solutions. What solutions are available?
In a past posting I listed the solutions Microsoft IT uses for client health management. And occasionally I've talked about how to build reports. But of course there are more client health solutions. I'm sure I'll miss some, so please let me know what I'm missing. Here's my current list of computer management client health solutions:
-
Reports
-
The first step in solving a problem is understanding the extent of it. Reports address that task
- From this blog you can find queries that can report client counts and client activity. They can be rendered as SMS or ConfigMgr web reports
-
If you're using the Client Health Tool (below), then it has a web report and Excel spreadsheet. Its data is also fairly good for distinguishing online broken clients and approproximating offline client counts. It's ConfigMgr R2 equivalent even has standard ConfigMgr web reports ready to go!
-
- Computer startup scripts (or logon scripts)
-
As your computer starts up you can have them run a script that checks the health of your clients and tries to repair them if need be
-
A computer startup script is preferred over a logon script because it will run in the system context, and as soon as the computer starts up (but asynchronously and in the background, so it doesn't delay user log in).
-
-
As Brian says,
DudeWorks (Rob Olsen et al) has a free download for the purpose, and they have a support forum to discuss it (myITforum.com occasionally has threads on it as well)
-
Client Push Installation
-
This is a standard client deployment method for SMS and ConfigMgr
-
If you know which clients might be unhealthy, you can try pushing the client at them to see if re-installation helps (it does sometimes)
-
Remote Scripts
-
Manual Remediation
-
Yes, this is what we're all trying to avoid. It's expensive because it's time consuming and may involve travel. But it works, so for completeness we must keep it in mind
-
Most often you will find that the root cause is not in SMS or ConfigMgr itself. For example, it might be a WMI problem (see below). Fixing those issues will also help other software that depends on those components
-
Checking the state of the SMS or ConfigMgr service itself (ccmexec) is a good starting point. It may be stopped or disabled, which is easy to correct. Finding out why it was stopped or disabled may be trickier, as is correcting that problem
-
If the computer seems fine but ccmexec is not working, running the command ccmexec /repair may help
-
If the repair doesn't work, a simple client re-installation may help. Deinstalling and re-installing is another variation that sometimes helps
-
Manual Investigations
-
The SMS Client Health Tool
-
It's just a reporting tool (no repairs), but it provides some valuable data that is not available elsewhere. In particular, it scans management logs for client policy requests (which should happen hourly, by default). For those clients that don't request policies, it pings them to find if they're online, and if possible it will try to retrieve a few core details about the state of the client.
-
-
It was originally for SMS 2003 SP1, but I've used it successfully with every version of SMS and ConfigMgr since then
-
A new version will be available with ConfigMgr R2, called Client Status Reporting. I'll blog more about that soon.
-
-
Collection-based Targeting
-
Fallback Status Point (FSP) data
-
FSP is a new ConfigMgr system role that collects data on clients that have problems during installation or (in some cases) start failing to communication with management points
-
-
Maximize the number of clients that are online
-
If you want more clients to be active (to apply patches tonight, for examples), then you can use IBCM and/or WOL
-
Internet-based Client Management (IBCM) is a ConfigMgr feature that allows you clients to be managed when they're outside of your corporate network, as long as they can access the Internet
-
Wake-on-LAN (WOL) is available in ConfigMgr and from third parties. Computers that are powered down can be woken up remotely
- Anything that improves server and environmental health
- If your SMS or ConfigMgr servers are working less than perfectly, or the environment your clients are working in has issues, then your clients will be less active, and thus less healthy, than they could be
-
Consider using MOM's or OpsMgr's SMS or ConfigMgr management packs
-
Use good ITIL or MOF (Microsoft Operations Framework) policies and procedures to keep your servers running smoothly
-
-
No doubt other consulltants offer similar services, as may your Technical Account Manager or Premier Support specialist, if you've signed up for such services
-
The community, including myITforum.com, the Microsoft forums and newsgroups, blogs, and other web sites offer advice on a wide variety of issues
-
Anything that improves computer health
-
If the client computers themselves are running well, then that maximizes the opportunity for the SMS or ConfigMgr client to run well
- WMI Improvements
- Windows Management Instrumentation (WMI) is fundamental to ConfigMgr client activities, but historically it has had some reliability problems. It takes some time for them to develop, and they only occur on a small percentage of computers, but when we're aiming for 99.x% health, it doesn't take a lot to cause grief
- The WMI team has taken this issue very seriuosly and produced a thorough solution in the form of the WMIdiag tool and its guidance. I've got more links for it in this old blog posting
- The WMI team has also invested a lot of effort in understanding the root causes and made relevant improvements in Windows Vista. Those hotfixes have also been backported and are available for Windows XP and Windows Server 2003
-
- Guidance
- Understanding the world of client health can be challenging. We all have an intuitive understanding, but the more we dig into it, the more challenges we find. So reading about client health will help to get you comfortable with this world
- Rick Jones' and Chris Stauffer's documentation based on Chris Sugdini's collection-based solution (above)
- Don Hite's blog occasionally has articles on client health
-
I hope this blog is useful
-
My presentation at MMS 2008 on client health will take a 'start at the beginning' approach and dive into as much detail as time allows. So I hope that will make for a cohesive story
-
check it out at presentation SI21, "Advances in SMS 2003 and Configuration Manager 2007 Client Health Management", which is Tuesday, April 29 11:45 AM - 1:00 PM in the Titian 2303 room
-
or see it on the DVD, when that's available
-
If you think more is needed, say so (to anyone that will listen, including in the comments of this blog)
Whew - that's a lot of options. You don't have to use all of them. Once you understand your needs, you can pick and choose the solutions that are appropriate for you.
And I hope you noticed that a lot of people have been working on client health, including Microsoft since September 2004. Together we are beating this issue.
Summary: patch scanning is normally a quiet behind-the-scenes activity that computer managers don't have to worry about. But that doesn't mean we shouldn't proactively look for worst-case scenarios.
Those of us that have been in the patch management business a couple of years or more will recall that sometimes patch scanning can be less quiet and behind-the-scenes than it should be. So we know that it's wise to watch patch scanning times. Even if there isn't a widespread issue, maybe there are some corner case scenarios we can identify and improve.
The following SQL script calculates the scan time for SMS 2003 clients:
-- get a sample set of relevant records to work with - for large hierarchies the whole table would be too large
SELECT top 100000 machinename, time, messageID into #temp
FROM v_StatusMessage s3 LEFT OUTER JOIN v_StatMsgAttributes AS att ON s3.RecordID = att.RecordID
WHERE att.AttributeID = 401 AND att.AttributeValue = '<patching advertisement ID>' AND messageID in (10005,10009)
group by machinename, time, messageID
-- put the 10005's (advertisement started) into a seperate table, so the max time select won't be confused with the 10009's max time select
select machinename, time, messageID into #temp5 FROM #temp where messageID=10005
-- same for 10009's (advertisement successfully run, with details returned via status MIF)
select machinename, time, messageID into #temp9 FROM #temp where messageID=10009
--get the most recent records for the 10005's
select t1.machinename, t1.time, t1.messageID into #temp2
from #temp5 t1 join #temp5 t2 on t1.machinename=t2.machinename
group by t1.machinename, t1.time, t1.messageID
having t1.time=max(t2.time) order by t1.machinename
--same for the 10009's, and put them into the same temp table
insert into #temp2 (machinename,time,messageID)
select t1.machinename, t1.time, t1.messageID
from #temp9 t1 join #temp9 t2 on t1.machinename=t2.machinename
group by t1.machinename, t1.time, t1.messageID
having t1.time=max(t2.time) order by t1.machinename
--build a new temp table with just the times, so that the datediff calculation is easy
SELECT machinename,
( select time from #temp2 s1 where messageID=10005 and s1.machinename=s3.machinename) 'Start',
( select time from #temp2 s2 where messageID=10009 and s2.machinename=s3.machinename ) 'End'
into #temp3 FROM #temp2 s3 order by machinename
--look at the results, without the exceptional data (with is a smallish percentage, like 15%)
select datediff(s, [start], [end]) from #temp3
where [start] is not null and [end] is not null
and datediff(s, [start], [end]) >0 and datediff(s, [start], [end]) <2000
order by datediff(s, [start], [end])
--the all important average
select avg(datediff(s, [start], [end]) ) from #temp3
where [start] is not null and [end] is not null
and datediff(s, [start], [end]) >0 and datediff(s, [start], [end]) <2000
This script is also a good example of the usefulness of temporary tables. I don't pretend to be a SQL guru, but I like techniques that allow non-guru's to accomplish complex tasks using SQL alone.
As usual, I can't guarantee that this is the only or best way to accomplish this goal. But it has worked well for me and seems to return accurate results. Now I have to come up with a ConfigMgr equivalent...
Summary: The ConfigMgr software development kit is finally released to a public URL.
Actually, it hasn't been hard to get ahold of the SDK (via the Connect (beta) program). But that can be a bit of a hassle, and while the SDK is being worked on there's obviously going to be some content that is missing or tentative.
In this case I know the SDK has received a LOT of work from some top-notch people. I'm sure you'll be impressed by the sheer size of it. There's got to be lots of good stuff in it.
I have a bit of programming experience (well, 5 years as a professional programmer, at a few companies). And I certainly like to get creative when solving computer management problems (thus the blog, and similar efforts in the past). An SDK is the ultimate inspiration for techniques to address things that I would like to improve. So this is very exciting for me - I hope it is for you too.
http://www.microsoft.com/downloads/details.aspx?FamilyId=064A995F-EF13-4200-81AD-E3AF6218EDCC&displaylang=en%20
Summary: SMS 2003 and ConfigMgr 2007 have an often useful feature called protected distributed points. But a site can have many protected DPs, and the locations that correspond with each DP will be of various sizes, so how do you know how many clients you have for each DP? How do you know if any of them are possibly supporting too many clients?
With protected DPs, you can't just divide the site size (in client count) by the number of DPs in order to get the clients per DP ratio, as you would with regular DPs. Each protected DP only serves the clients within its boundaries, and that will vary widely. So you'll need a query that relates the clients in the boundaries to the boundaries for each of the DPs. If you're using ConfigMgr, that's relatively easy:
(The ideal way to determine excess clients per DP is to monitor relevant performance counters during your worst-case deployments, but that's very labor intensive and has to be timed just right, so a cruder approximation is useful).
If you are using AD sites as your boundaries for the protected DPs, this query will give you the answer:
select servername, sitecode, count(distinct name0) 'clients'
from ProtectedSiteSystem_ARR PSS join v_BoundaryInfo bound on pss.BoundaryID=bound.BoundaryID join v_R_System sys on bound.value=sys.AD_Site_Name0
where client0=1 and obsolete0=0
group by servername, sitecode
order by count(*) desc
If you're using IP subnets as boundaries, this will be the query:
select servername, sitecode, count(distinct name0) 'clients'
from ProtectedSiteSystem_ARR PSS join v_BoundaryInfo bound on pss.BoundaryID=bound.BoundaryID join v_RA_System_IPSubnets subs on subs.IP_Subnets0=bound.value join v_R_System sys on sys.resourceID=subs.resourceID
where client0=1 and obsolete0=0
group by servername, sitecode
order by count(*) desc
If you're using both kinds of boundaries, the query is left to the reader ;-)
Of course, no solution is perfect. The subnets or AD site data for each client is dependent on the latest discovery data, so if you're running discovery infrequently then you may count some clients that have moved away, or not count some clients that have moved into the boundaries.
UPDATE: If you're using SMS 2003, then the above technique won't work - it doesn't have the ProtectedSiteSystem_ARR table (thanks to Yanze pointing that out in the comments (I am rather ConfigMgr-centric these days)). Russ Slaten's blog has a script that looks up protected DPs and their boundaries via the site control file. You could fairly easily extend that script to pull in the number of clients in each boundary, thus getting the same result as my queries above. That will work for both SMS 2003 and ConfigMgr.
p.s. A query best practice is to always use the views. But in this case the most useful information seems to be in the ProtectedSiteSystem_ARR table - I couldn't find an equivalent view. So I've broken with best practice in this case. In future versions of ConfigMgr we may have to tweak these queries.
p.p.s. As is almost always the case, this approach is based on my own analysis of the possible solutions, so I (and Microsoft) don't guarantee this is the best possible solution for this problem. But it seems to be working well so far. Your thoughts are greatly appreciated - I'm always looking for better solutions.
Summary: users have good reasons to keep their e-mail distribution list memberships up-to-date, so they're a great way to accurately target software distributions. Here's how you can create collections based on DLs (my which I mean Active Directory distribution lists, as would be used by Microsoft Exchange and Outlook).
I've always been a fan of distribution lists for keeping track of users. People care a lot about receiving e-mails that are applicable to the groups and teams they belong to, but they don't want spam, so they'll add and remove themselves to DLs as needed, without a long delay. Any other targeting mechanism is going to be dependent on details that users don't care a lot about (like OU's, or hardware configuration), lists made up by third parties, broadcasting, or being reactive ("I need this software, please send it to me now!").
So if you can do software distributions to collections based on DLs, you're very likely to hit the right people (more exactly, the machines of the right people). Similarly, any reports based on those collections will accurately portray the relevant machines.
Here’s the query for a collection that targets a DL close to my heart:
select * from SMS_R_User where SMS_R_User.UserGroupName like "%SMS Admins & Engineers - Desktop Mgmt"
You have to use the display name of the DL, not its alias. And I use the “%” so that I don’t have to worry which domain the group is in.
That’s for the users in the DL, of course. If you want to target the computers of the users, you can build a second collection using a query like so:
select * from SMS_R_System where LastLogonUserName in (select UserName from SMS_R_User where Name in (select Name from SMS_CM_RES_COLL_CEN00315))
So that’s a lookup via a subselect to the collection above and then a little translation of username format (via SMS_R_User). Of course it assumes the LastLogonUserName is the user of the machine (true about 90% of the time, at least within Microsoft).
Notes:
1) SMS_R_User (the WMI class), and its corresponding SQL view v_RA_User_UserGroupName, are populated by the AD User Discovery method. So you'll have to enable it, if you haven't already. The adusrdis.log will even show the groups for each user as their DDR's are created. Those details can be handy for troubleshooting, if you're not seeing the results you expect.
2) when you're building the first collection above, make sure the resource type is users, rather than user groups. It's true that you're selecting users based on the user group they're in, but that's different from building a collection of user groups. That's especially true if you manually build the collection (as most people usually do), rather than pasting my query above into the collection (and then changing the group name, of course). (That actually gets me thinking - what would you use a collection of user groups for? Any thoughts?)
3) to determine the 'display name' for the distribution group, I usually select it from an e-mail to the group. That way I'm sure I've got the right group with the right spelling. But if you're testing this idea in a lab then you might not have Outlook (or equivalent) available. For example, you might create the distribution group in the operating system's AD User and Computers administrative tool. In that tool, the "name" is the display name, and the "group name (pre-Windows 2000)" is what I would call the "alias". So you would use the 'name' in the first query above, not the 'group name'.
Summary: here's a technique for collecting details about ActiveX controls on your computers.
Do you have a need to inventory the ActiveX controls on your computers? Collecting details about ActiveX controls is not trivial. Many have a .OCX extension, but not all of them. And even if you find them on disk, are you confident they've been installed for use? So a simple software inventory rule won't do the trick.
I built an SMS / ConfigMgr hardware inventory extension for ActiveX controls a few years ago, and a little research today didn't reveal a better solution. If you have one, I'd be pleased to hear about it (as would others, I'm sure).
The solution basically involves sending a script like this via software distribution to your clients:
set fso = CreateObject("Scripting.FileSystemObject")
Set Appshell = WScript.CreateObject("Shell.Application")
Set folder = Appshell.NameSpace( windir & “\Downloaded Program Files")
Set items = folder.Items
For Each item in items
version="" : if item.path<>"" then version = fso.getfileversion(item.path)
wscript.echo item.name, version
Next
The "Downloaded Program Files" can look like an ordinary folder (depending on how you look), but it's actually a special object that is used to manage installed Active X controls.
The script for your hardware inventory extension will be a little more complex than the code sample above. Rather than displaying the control name and version, you'll need to write it to a custom WMI class, and get SMS or ConfigMgr hardware inventory to collect that WMI class. That's a bit of an art in itself, but I hope it's well described in this chapter of the SMS 2003 Operations Guide:
http://www.microsoft.com/technet/prodtechnol/sms/sms2003/opsguide/ops_0351.mspx?mfr=true
One concern with the solution I suggest is whether all ActiveX controls end up in that "Downloaded Program Files" object. Maybe only Internet Explorer ActiveX controls make it there. I really can't confirm that either way. If that's an important scenario for you, and you have non-IE ActiveX controls, it would be worth testing (and let us know).
BONUS MATERIAL: do you need to run scripts that use 32-bit objects on 64-bit computers. That's the case here, where "Shell.Application" is 32-bit only. It doesn't error out, but it doesn't return any results on 64-bit computers. When that's the case, you should run the script using the 32-bit version of cscript.exe, like so:
%SystemRoot%\syswow64\cscript.exe script.vbs
p.s. An alternate solution for this hardware inventory extension is to use the MicrosoftIE_Object class from the root\cimv2\Applications\MicrosoftIE WMI namespace. Even “the Scripting Guy” suggests it: http://www.microsoft.com/technet/scriptcenter/resources/qanda/dec04/hey1220.mspx. But that namespace doesn't seem to exist on Windows Vista or Windows Server 2008 computers, so it no longer seems to be a viable option.