SQL Server 2000 Long Running Agent Job and where is its alert?
Ran into s strange one today that I wanted to share, that deals with alert suppression with MOM 2005. If your SQL team cares about getting alerts about long running jobs then this is probably going to be a rule change that you might want to make.
This morning I was informed that the SQL team was not getting alerts about long running SQL jobs on a SQL 2000 server this weekend. I looked at the operator console and created a view to look at that servers specific alerts for the past 48 hours showing all alerts, all resolution values etc. I found an alert where the long running job script created an alert which was a Error. Any alert Error or higher should have fired off to their notification group. The Alert had a repeat count that was about 100. I looked at the history of the alert and it only showed:
7/1/2007 4:07:04 AM: NT AUTHORITY\NETWORK SERVICE
Changing AlertLevel of Alert from 30 To 40.
7/1/2007 3:07:03 AM: NT AUTHORITY\NETWORK SERVICE:
Alert is created by Script SQL Server 2000 Long Running Agent Jobs.
Where was the notification event? To figure out what's going on I need to take a look at the rule that runs the script, as well as the script to get a better understanding of what's going on. The script is SQL Server 2000 Long Running Agent Jobs. The script has three parameters, ErrorThresholdInMinutes value 120; InformationEvent value of False; WarningThresholdInMinutes value of 60; This is only interesting because this script can create alerts with two different Alert Levels.
The rule Microsoft SQL Server\SQL Server 2000\SQL Server 2000 Health and Availability Monitoring\SQL Server 2000 Long Running Agent Jobs
Looking at the Alert rules suppression settings gives a good clue of exactly what happened. Suppress duplicate alerts is enabled on the following fields.
Alert Name, Computer, Domain
Like it says below the Enable Suppress Duplicate alerts: Fields must be identical for the alert to be considered a duplicate and suppressed.
Well this alert was created by the same rule, and was unresolved, so until the severity field is added to the Alert Suppression field list no one will get an alert if the severity changes from Warning to Error for this rule. To properly fix this create a copy of the original rule, and paste it in the same folder, rename to "Modified - SQL Server 2000 Long Running Agent Jobs". Do this so when you upgrade the SQL Management Pack you don't loose your changes. Disable the original rule. Edit the "Modified - SQL Server 2000 Long Running Agent Jobs" click on the alert suppression tab, and click the check box next to the Severity field, and click ok. Don't forget to Commit Configuration Change.
This rule/script can easily change the severity of an alert however, the alert suppression settings allow the rule to change the severity level of the alert with out creating a new alert with a new severity, which could cause an alert to get by on the weekend when folks are not looking at the console all day.
Enjoy!