Summary: client health is a challenging computer management problem, as we've discussed before. Over the years Microsoft IT has developed a multi-faceted strategy, consisting of the following elements. It is an evolving story.
• Reports based on SMS client data (indicates client availability). History-based trends highlight anomalies that require intervention.
• Reports based on SMS Client Health Tool (CHT) data (indicates whether clients are online).
• Reports based on SMS client data and CHT data (indicates machines that are online but not available for management, and thus clients that are probably broken).
• A computer startup script, which checks for SMS client problems and corrects them if necessary (but only on computer restart).It is very flexible and powerful, and thus can address a wide variety of problems.
• Client Push Installation (CPI) for clients that are not available for management (have no reported inventory data, etc. for an extended period). This targets both offline and online computers, requires privileged server-to-client connectivity, and is executed only once per day at most.
• Manual client remediation for datacenter clients. Datacenter clients are high-value servers that are almost always accessible, so this is a practical (though still expensive) solution in this case (it would not be so practical for desktops).
• Manual client remediation for locked-down clients. Locked-down clients are tightly managed and precisely configured, so they can be found and remediated with a fairly high degree of success (though this is also expensive, and applies to a small minority of our clients). Most MSIT clients do not have these advantages.
• Manual client remediation for regular (non-locked-down) Microsoft IT clients when users complain. This can be done by users, helpdesk (tier 1), tier 2, tier 3 (SMS administrators), or tier 4 (SMS engineers), depending on the severity of the problem. The cost increases at each level. Users rarely complain, so this cost is fairly low
I've had the privilege of talking to a variety of organizations over the years about client health and they use a combination of simlar (but not usually the same) client health management options.
We are always looking for enhancements to our strategy, and I look forward to telling you about those in coming months.The main point for today is that you do have a variety of options available to you, and so your client health strategy should be creatively based on whatever options are relevant to your needs, and cost-effective for your environment and client health expectations.
Summary: SMS Trace (trace32.exe) is the core tool in every SMS administrator's toolkit. But do you really have time to be watching logs fly by? Why not script what you need to watch for and thus let your console do the watching?
SMS Trace has always impressed me - what kind of wonderful developer magic makes it possible to display new lines as they're added to logs in real time? What is the mysterious API that makes that possible? As a scripter I always wanted to hook into trace32 so it would 'shout' when the event I wanted would happen (highlighting helps, but can quickly scroll away). Or translate lines as they go by (what does that GUID mean?). Or relate one log to another (what does the scheduler do after the distribution manager does its tasks?)
Well today things built up to the point where I could no longer ignore this possibility. I wasn't particularly optimistic that vbscript could work such magic, but I really can't afford to be manually chasing rare yet important scenarios. With a bit of research I found that SMS Trace isn't quite as magical as we might think.
It turns out that long ago in the UNIX world they developed a tool called Tail.exe to show the last lines. Then they added a "follow" function to show new lines as they were added to the file - sound familiar? Of course such tools are also available in the Windows world, and they're simple enough that the source code is often shared.
How do such tools work? Well it's actually pretty basic - check the file frequently (say every 1/4 second) to see if the file size has changed. If so, open it and jump to the point you last read. Read the rest of the file to the end-of-file. Display the results and repeat. Is that disappointing, or what? That's not magical. It sounds really inefficient (that's a lot of file opening for active logs), but it seems to work well, even in vbscript. I ran such code all day and it had no adverse effect on my console or server (I didn't monitor the network, but if SMS Trace does the same thing then we've been getting away with it for years).
I'll include the code in a moment, but first a bit of the envisioning thing: it won't take a lot of scripting skills to build the real solutions around this kind of code. For example, it's Patch Tuesday (or Patch Wednesday if you're outside of North America), and you build your patch packages and deploy them. What's next? You watch the packages to make sure they get out to all your DPs (and thus your clients). Generally that goes well, but can you afford to just assume that it will be fine? If you work for a small to medium organization with a fairly stable and predictable environment the answer should be Yes. For the rest of us, we monitor everything closely in real time - with SMS Trace. So what if one of us writes the script to do that watching and shares it? Everyone benefits! The same can be done for all the other common SMS activities. So I hope some of you will experiment with this idea and then start sharing.
Code time (fully working sample this time):
logfile = "replmgr.log" 'it changes a lot so it's a good example
Const ForReading = 1
Set fso = CreateObject("Scripting.FileSystemObject")
prev_size = fso.GetFile( logfile ).Size
file_found = false
while not file_found 'handle rollovers
on error resume next
current_size = fso.GetFile( logfile ).Size 'on file rollovers, when the file switches from .log to .lo_, this line could fail, momentarily
if err=0 then
on error goto 0
if current_size < prev_size then prev_size = 0 'it must have rolled over but not been caught when checking the size
file_found = true
on error goto 0
prev_size = 0 'to guarantee no data loss we should get the end of the previous version of the file first
try = try + 1
if try=10 then wscript.echo "couldn't find file to get its size" : wscript.quit
wscript.sleep 1000 'wait a second before looking for the next one
'display changes in the file
if current_size <> prev_size then
Set f = fso.OpenTextFile(logfile, ForReading)
f.skip( prev_size ) 'this can be slow on large files, especially at times of large changes to the file, but not bad (2 or 3 seconds at worst?)
new_data = f.read( current_size - prev_size ) 'there are other ways to do this, but this seems to work best
new_lines = split( new_data, vbCRLF )
for each new_line in new_lines
if new_line<>"" then wscript.echo new_line
prev_size = current_size
wscript.sleep 250 'pause before checking again
And what does the output look like? Just like SMS Trace. Run them side by side and you'll see the same output at the same time. Occasionally the script will hesitate, but even then only for a few seconds. Of course the real point is for you to add value by adding logic to only show the lines you want. Or in a more meaningful format. Or whatever.
Summary: WMI is key to Windows computer management but experienced admins know that WMI itself can fail. The WMI team has been seriously investigating those failures and introduced changes in Vista (and upcoming Windows Server 2008). Now those improvements are being backported to Windows XP and Windows Server 2003.
That news was announced at the MMS 2007 conference and was one of the big take-aways from that conference. Relief was finally imminent (though WMIDiag was also a good step forward (WMIDiag being for WMI self-test and assessment - details below)).
Early in my career here at Microsoft I had the pleasure of writing some WMI documentation, in addition to my SMS writing duties (in particular, I did the core WMIC documentation and helped with a few SDK topics). While chatting with the WMI team I was often struck with how thoroughly they supported developers with documentation and tools, but how little they did for administrators. To some degree that is natural - the WMI team were almost all developers, so they're going to sympathize with the needs of developers. Also, a system doesn't get well used (and thus doesn't have many significant operational issues) until developers develop something to take advantage of it. I tried to advocate on behalf of administrators like us, but can't say that I made much impact. However, over the years I have seen that the WMI team has gotten the message and has invested very significantly in addressing operational issues - a lot of great work has been done by some very smart people. These hotfixes and WMIDiag are key deliverables from that work.
Microsoft's Dan Conley gave a great update on these hotfixes on May 17th, which Rod Trent reported completely: http://myitforum.com/cs2/blogs/rtrent/archive/2007/05/16/wmi-corruption-hotfix-ready-kb-article-coming.aspx
What I haven't seen reported so widely was that the Server 2003 hotfix did come out on schedule on May 22nd and is available at this link: http://www.microsoft.com/downloads/details.aspx?FamilyID=94ce776e-a4da-4937-b2fa-3ec16495222e&DisplayLang=en
p.s. For your convenience, here's the KB article: http://support.microsoft.com/kb/933062
And while we're at it, here's some key WMIDiag links: