SCOM 2007 - OpsMgrHealth Service Stops unexpectedly with error %%2164195332

BACKGROUND (FYI, this article has a new update as of 11-05-2007. See bottom for more details)

This year we deployed SCOM 2007 to watch over our SMS infrastructure for server, DB, SMS, performance or other problems.  It's been pretty useful for us so far (of course the SCOM console speed is a bit slow, but that's supposedly improved in SP1). Recently however, the "OpsMgr Health Service" has been stopping unexpectedly and not wanting to start back up without a reboot.

SYMPTOMS

In the SYSTEM event log we see the following error:

Event Type:      Error
Event Source:    Service Control Manager
Event Category:  None
Event ID:        7023
Description:     The OpsMgr Health Service service terminated with the following error: %%2164195332


At the same time, in the OPERATIONS MANAGER event log we see the following errors (in this order):

Event Type:      Error
Event Source:    Health Service ESE Store
Event Category:  General
Event ID:        486
Description:     HealthService (2240) Health Service Store: An attempt to move the file "D:\System Center Operations Manager 2007\Health Service State\Health Service Store\edb03DAA.log" to "D:\System Center Operations Manager 2007\Health Service State\Health Service Store\edbtmp.log" failed with system error 5 (0x00000005): "Access is denied. ".  The move file operation will fail with error -1032 (0xfffffbf8).

Event Type:      Error
Event Source:    Health Service ESE Store
Event Category:  Logging/Recovery
Event ID:        413
Description:     HealthService (2240) Health Service Store: Unable to create a new logfile because the database cannot write to the log drive. The drive may be read-only, out of disk space, misconfigured, or corrupted. Error -1032.

Event Type:      Error
Event Source:    Health Service ESE Store
Event Category:  Logging/Recovery
Event ID:        492
Description:     HealthService (2240) Health Service Store: The logfile sequence in "D:\System Center Operations Manager 2007\Health Service State\Health Service Store\" has been halted due to a fatal error.  No further updates are possible for the databases that use this logfile sequence.  Please correct the problem and restart or restore from backup.

Event Type:      Error
Event Source:    Health Service ESE Store
Event Category:  Logging/Recovery
Event ID:        471
Description:     HealthService (2240) Health Service Store: Unable to rollback operation #18233124 on database D:\System Center Operations Manager 2007\Health Service State\Health Service Store\HealthServiceStore.edb. Error: -510. All future database updates will be rejected.

<and a few more 471 errors identical to that last one>

DIAGNOSTIC STEPS WE TOOK

We looked at the folder mentioned in the Operations Manager events  <SCOM INSTALL FOLDER>\Health Service State\Health Service Store  and the account running the health service had full control of it, so I ruled out permissions.

We looked at the D: drive, and it had tons of free space, so it wasn't disk space.

We tried to restart the service, no luck.

We restarted the server - It would work for a while and then after several hours to a couple of days it would stop again.

CAUSE OF THE PROBLEM
We had recently installed a new version of Symantec Antivirus on our SMS servers from scratch and had no exclusions on the health service store (who knew you needed to?).  Turns out that during the normal operation of the SCOM service, there is a lot of disk activity on the files in that health service store folder and at times the two can trip over each other and SCOM stops because it can't move files around the way it wants to because SAV has a lock on them.  That's why restarting didn't help, and why rebooting did (for a time).

RESOLUTION (Update: see update section below)

  1. Set your antivirus software to exclude real-time scanning from the <SCOM INSTALL>\Health Service State\Health Service Store folder.
  2. Delete all files out of the same <SCOM INSTALL>\Health Service State\Health Service Store folder.
  3. Restart the HealthService again and let it rebuild everything.
  4. Party like it's 1999.

Hope this helps!

 

!!!! UPDATE !!!! 

After a short period of success, we still were having intermittent problems with the health service stopping the exact same way.  After turning on file access auditing, I realized it was ALSO Diskeeper (we use version 2007) which was accessing this folder.  So, there's now another step I'd like to add to the resolution:

RESOLUTION (UPDATED)

  1. Set your antivirus software to exclude real-time scanning from the <SCOM INSTALL>\Health Service State\Health Service Store folder.
  2. Set your Diskeeper or other defrag utility's file/folder exclusion/I-FAAST settings to ignore the <SCOM INSTALL>\Health Service State\Health Service Store folder.
  3. Delete all files out of the same <SCOM INSTALL>\Health Service State\Health Service Store folder.
  4. Restart the HealthService again and let it rebuild everything.
  5. Party like it's 1999.

 

Number2 (John Nelson) 
MyITForum - Forum Posts 
MyITForum - Blog
Add to Google 

Comments

No Comments
Powered by Community Server (Commercial Edition), by Telligent Systems