Jason Condo at myITforum.com

Ramblings of loose mind - if it deals with workstation or server management, I'm there!

Catastrophic Failure has occurred....

During the process of adding a Gateway server for an Operations Manager environment, I received the following error while adding a certificate using MomCertImport.exe.

The certificate is valid, but importing it to certificate store failed.
Error description: Catastrophic failure
Error code:0000FFFF

Photo by David Baker - BakerDavid@gmail.comNow to me, a catastrophic failure makes me think the server is going to burst into flames. While I doubt that will happen, I think someone had a good sense of humor when they wrote that message.

Some quick research didn't help me resolve why I got his error, other than asking whether the other users having this issue had stepped through some magical documentation, as if it would inspire some fairy dusted insight and everything would just click. While documents with instructions are great, they don't help allot if you don't understand what you are clicking on and what to do if your prompts or environment isn't as described in the document.

This is not to say that the following two resources are not useful, as they are quite so. I would recommend both of these resources, although it is also good to understand PKI and certificate management before blindly following instructions on setting up your environment. I mean, how do you know that what some document on the web is telling you isn't going to open you up to vulnerabilities or just malicious.

For securing your OpsMgr environment refer to System Center Operations Manager 2007 Unleashed, Chapter 11 Obtaining a Certificate. It has great step by step instructions, although I did find some missing steps. Additionally, this web doc was a good resource and built upon the knowledge from the above book, although it had more step by step instruction that if you were not already familiar with the processes could require just blind faith in implementing. Great for a lab but not for blindly building your production environment. http://www.systemcenterforum.org/ops-mgr-2007-gateway-server-and-pki-scenarios-document/

Anyway's, I easily resolved the issue and had a couple words of wisdom from my trials. Note, all of the following information assumes that you have read the documents from above. Using MomCertImport, there are two options available in the tool.

The first (given in both the examples above) uses the exported PFX certificate file. This is used by opening a cmd shell and executing: "MomCertImport <path to PFX Cert file>" with the optional "/Password <password>" added to the end. If you don't specify the "/Password" parameter, the tool will just prompt you for it. Mind you, if you export the Cert without specifying a password, you will not be able to import it because the utility will not take NULL passwords.

The second option is to use the "/SubjectName" parameter for the tool. This allows you to use the installed cert without exporting it to a file and possibly opening your cert to being compromised if you have a simple password (I bet half of you use some variation of "password" when exporting) and the file falls into the wrong hands. The command would be like "MomCertImport /SubjectName <subjectname>". The subject name comes from cert if you view the details of it from the MMC. (Note that you gave the subject name when creating your request for your cert, it was the FQDN name of the server).

Now come the problem of why the catastrophic failure occurred and why everyone says to refer to the documentation. When executing MomCertImport using the file, the cert will be opened and then will be validated against the root authority, while using the subject name option will assume the certificate is validly installed and will just use its information. Either option will then pull the serial number from the cert and put it in the registry for the OpsMgr Health Service to use. While this works fine on domain resources that request certs from enterprise CAs, if you cert was requested from a server or client that is not part of the PKI infrastructure, then the cert you get and install will not be able to be validated against the root authority. This is because the Root CA that you requested the cert from does not also have a certificate in your trusted roots store.

You can validate this by observing the first page of the details of you certificate. If it says that it installed but is not valid, then you may want to refer to the web document above about "Retrieving and installing the Root CA certificate". Now the workaround to the catastrophic failure was to use the SubjectName parameter, but you can then imagine what my next problem was. I spent a bit of time troubleshooting supposed communication/firewall errors because even after using the tool and seeing the cert serial number in the registry, I was not able to get the gateway server to talk to the management server. After authorizing the gateway server on the management server (the one that had a external FQDN with a hole for port 5723 into the internal network) I was seeing error in the eventlog talking about opening the communication to the management server but that the management server closed it.

Obviously this was a cert issue. Just as I was going to blow all the certs away and start the process back from the beginning, I saw the cert error in the MMC and it all came to light so quickly. Right after adding the Root Authority cert, everything worked. Now this could have been resolved quickly if I had just paid attention to the instructions (although the book didn't refer to this) or if the error message on the utility would have been better.

 

So to wrap it up... First, my servers did not catch on fire, so the "Catastrophic Failure" was way overstated (although it did cause some pain). Second, making sure that you have valid root authorities for your installed certificates is important (whether they are yours or from the external CA that issued your cert). Third, don't just blindly follow documents or instructions without understanding why you are filling specific fields or using specific names. I see this many time when assisting others in troubleshooting certificates on webservers and ConfigMgr.

Actually, I can't believe I didn't catch my cert error in the first place. I hope this helps you save some time, whether it is for OpsMgr or for something else.

Posted: Aug 21 2008, 05:45 PM by jcondo | with no comments
Filed under:

Comments

No Comments