January 2011 - Posts
Client hotfixes in ConfigMgr has become an often discussed topic as of late mainly because of the pre-R3 hotfix (977384) required on all managed systems. Microsoft recently published a good KB on this titled System Center Configuration Manager 2007 Hotfix Installation Guidance. Near the bottom of this KB they added this small query to help build collections to target the client hotfixes:
select * from SMS_R_System
inner join SMS_G_System_SMS_ADVANCED_CLIENT_STATE
on SMS_G_System_SMS_ADVANCED_CLIENT_STATE.ResourceID = SMS_R_System.ResourceId
where SMS_R_System.ClientType = 1
and SMS_G_System_SMS_ADVANCED_CLIENT_STATE.Name = "CcmFramework"
and (SMS_G_System_SMS_ADVANCED_CLIENT_STATE.Version < "4.00.6487.2012"
Unfortunately, this query is not specific to any particular hotfix and the hotfix KBs themselves do not list which component that they update – although they do list version numbers if you know what to look for.
So my task was clear, figure out what components get updated version numbers. First, I used this little query to pull back all of the component versions in the environment I am currently working in:
SELECT
AC.Name0, AC.Version0
FROM
dbo.v_GS_SMS_ADVANCED_CLIENT_STAT AC
GROUP BY AC.Name0, AC.Version0
ORDER BY AC.Name0
And from there I was able to put together this little table for the following three client hotfixes (there aren’t a whole lot of client hotfixes so the table is short, but worth documenting and thus this blog post):
| KB | Component | Version |
| 977384 | SmsPowerManagement | 4.0.6487.2157 |
| 978754 | SmsOSDeployment | 4.0.6487.2115 |
| 2444668 | SmsInventory | 4.0.6487.2161 |
If you review the list of files affected in each of the above KBs, you will find the version numbers listed there although they are associated with files and not components in the KB.
Note that SmsPowerManagement does actually exist on the clients even if you haven’t installed 977384. This is because R3 functionality was actually shipped in SP2 and installing R3 merely enables it.
Clients with 977384 also have two other components bumped up in version:
| Component | Version |
| CcmFramework | 4.0.6487.2155 |
| SmsSoftwareDistribution | 4.0.6487.2154 |
I can’t find the above version numbers publically documented anywhere though.
Using this info and the query you (and I) can now target or report on client hotfix deployment.
Inspired by a recent forum thread (and John Marcum), I put together a little test to verify if ConfigMgr does indeed automatically retry advertised programs that failed. I created a simple one line batch file and advertised it on my test client:
exit 999
This one line simply exits the batch file and returns the error code 999 which in this context is meaningless save for the fact that it is not a success code.
The results from execmgr.log on the client pretty much speak for themselves but do in fact verify that ConfigMgr will automatically retry a failed program:

Notice that it first sets the program status to FailureRetry and then WaitingRetry after the failure and that it actually tracks how many times the program has failed. This is important because as we’ll see, the number of times it will retry a program is fixed so that it doesn’t go on trying forever.
And right on queue, 15 minutes (and one second) later, ConfigMgr retires the program – with the same result in this case of course:

Another important thing to notice is this phrase “Non fatal execution error”. This too is very important because it suggests that there are also “Fatal execution errors” and that ConfigMgr treats them differently. Why would it do this? Because you don’t want it to simply retry every failure. If an installation executable or MSI is broken and throwing an error, re-running doesn’t help or change the resulting failure.
Below is the result of another simple script that returns 1 as an exit/error code. As you can see, ConfigMgr set the status to FailureNonRetry; it didn’t explicitly call the failure a “Fatal execution error”, but the implication based on comparing these scenarios is there. And of course, ConfigMgr will not retry this failed program.

So where are things like the retry interval defined and what constitutes a “Non fatal execution error” ? In the site control file of course. Here’s a snippet from the sitectrl.ct0 file in my lab (this is default as I have made no changes):
PROPERTY <Execution Failure Retry Count><REG_DWORD><><1008>
PROPERTY <Execution Failure Retry Interval><REG_DWORD><><600>
PROPERTY <Execution Failure Retry Error Codes><REG_SZ><{4,5,8,13,14,39,51,53,54,55,59,64,65,67,70,71,85,86,87,112,128,170,267,999,1003,1203,1219,
1220,1222,1231,1232,1238,1265,1311,1323,1326,1330,1618,1622,2250}><0>
The meaning of these properties is self explanatory but one small comment to make is about the Retry Interval. Notice that it is defined as 600 [seconds] or 10 minutes but the retry in my example above was 15 minutes. This is because of the (default) 5 minute notification given to users when a mandatory advertisement is about to run: 10 + 5 = 15.
That still leaves the question of why are these specific codes defined as “Non fatal”, and automatically retired, while others are not? The answer reveals itself when you look at the meaning of each of these codes (I’ll leave that as an exercise for you). They all have to do with infrastructure status or configuration and not directly with an internal failure of the command-line being run; they are essentially external to the command-line. Things like “Access is denied” (error code 5), “An attempt was made to establish a session to a network server, but there are already too many sessions established to that server.” (error code 1220), and “Another installation is already in progress. Complete that installation before proceeding with this install.” (error code 1618) are all things outside the control of command-line run and thus may change the next time the command-line is tried. However, errors like “Incorrect function” (error code 1) and “The system cannot find the file specified.” (error code 2) are clearly issues with the command-line itself and no number of times retying it will change the result and thus these are not ever retried automatically.
One last note is that simply removing an advertisement that is in the WaitingRetry status will indeed remove it from the target system (once the system receives the policy update of course) as evidenced by the log file snippet below.

Recently at a client in production and in my home lab, I ran into the an issue where the Software Updates task in my Build and Capture TS for Windows 7 would just hang. It would get to the task and scan for updates correctly according to the log and the UI, then it would just sit on “Installing Update 1 of x” forever (and never time out). Digging through the logs didn’t reveal much except for errors in the UpdateHandler.log with the error code 0x80040669. I have previously seen this error code before and even documented some fix actions (http://myitforum.com/cs2/blogs/jsandys/archive/2009/03/16/build-and-capture-and-software-updates.aspx), but those were not working even though they were in place.
After comparing settings, rebuilding packages, recreating Update Deployments, and searching the web, I finally found the issue: our old friend, the boundary. In the cases where this issue was cropping up, an AD Boundary was in use. Switching the boundary to an IP Range fixed this issue. I don’t recall specifically whether or not the IP Subnet was included in the AD boundary or not, but I think it was and given that these task sequences, like all good build and captures task sequences, were executing on non-domain joined systems, the AD site was useless.
An excellent summary of all the fix actions (that I have ever seen) for Software Updates in a task sequence is at http://coreworx.blogspot.com/2010/08/configmgr-install-software-updates-task.html.
Just another reason why I refuse to use or recommend anything except IP Range boundaries (future blog post on that).
Troubleshooting client agent health issues at my current customer, I wanted to eliminate all of the stale systems from AD so I didn’t waste my time on them (and of course the customer was no real help here). I decided to write a script to take a list of systems, check if a forward and a reverse DNS entry exists and also compare the DNS reverse entry (if it exists) to the name of the system as specified in the list. Using these checks, I can now identify systems that probably don’t exist anymore and can be deleted from or disabled in Active Directory thus allowing ConfigMgr to be cleaned up.
Sample output:
Name IP Reverse Status
---- -- ------- ------
xyz1 10.1.0.1 abc5 IP registered to another system
xyz2 - - Could not Resolve IP
xyz3 10.1.0.3 xyz3 OK
xyz4 10.1.0.4 - IP Address not found in reverse zone
Actual/exact interpretations of each of the categories is possibly subjective and based on the configuration of a particular environment but in general, IP registered to another system and Could not Resolve IP are indicative of stale systems. Recall that AD System Discovery also does a forward DNS lookup on systems before it creates a DDR on them so this script follows similar logic as the discovery; however, once the system is discovered, AD Discovery won’t remove it and thus this script. Also, AD discovery doesn’t do a reverse lookup because this may or may not be configured in any given environment.
The script is a PowerShell script and can be run on any system that can query the internal DNS. By default, it pulls the names of systems to check from a file called sys.txt in the same directory as the script; place each system name to query on a separate line.
And then, run it from a PowerShell command prompt. To output the results to a CSV, pipe the output of the script to the Export-Csv commandlet; e.g., .\IPCheck.ps1 | Export-Csv c:\IpCheckResults.csv
Download: IPCheck.zip
Working at a client troubleshooting some weird ConfigMgr (not SCCM) client agent issues. Basically, there are a handful of systems, all laptops, that just dropped out of ConfigMgr. Reviewing the ExecMgr log shows that the last activity was the installation of Outlook 2010 via an advertisement that completed successfully requiring a reboot. After that, nada.
After reviewing a few logs, I came upon this in ClientIDManagerStartup.log repeating over and over:
RegTask: Failed to get certificate. Error: 0x80004005
Failed to find the certificate in the store, retry 1.
Failed to find the certificate in the store, retry 2.
Failed to find the certificate in the store, retry 3.
Failed to find the certificate in the store, retry 4.
Failed to find the certificate in the store, retry 5.
Already refreshed within the last 10 minutes, Sleeping for the next 9 minutes before reattempt.
This looked like a certificate issue so I opened up the certificate store using MMC. And something else strange: only one SMS certificate in the SMS store. I deleted this one and only SMS certificate and restarted the client agent. Same result.
Next step was to examine the actual file containing the certificates from the file system. Based on some research and this thread (http://social.technet.microsoft.com/Forums/en/configmgrsetup/thread/f5fd16e0-ca2a-40b0-9989-ee15da21f423), the file (on XP) is located at C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys and is always called 19c5cf9c7b5dc9de3e548adb70398402_50e417e0-e461-474b-96e2-077b80325612.sys. I checked permission as described in the forum thread, but those appeared correct. So I decided to delete the file and restart the agent. And like magic, all was right with the world again … at least for this agent. 10 to 20 more to go.
I have no idea what caused these certificates to become corrupted. One thing I did notice on all these systems was “Malwarebytes” so malware is a high probability. Another possibility is user “intervention” as these were all laptops that has just finished installing Outlook 2010.