Abstract
I was recently working with Vlad Joanovic, the Microsoft Program Manager for MP development on a new MP that has the potential of having a single health service acting as proxy for thousands of devices. The subject of workflow cook-down became a high priority, for obvious reasons. The potential of having at least one workflow (and possibly tens) run for each device was certainly something that needed to be avoided. After implementing a series of DataSourceModuleTypes, UnitMonitorTypes, Rules, and so forth that effectively fulfilled this requirement, we reflected for a moment on how there isn't really an end-to-end treatment of this subject out there at the moment. This is my attempt at rectifying that.
Background
At the heart of almost all OpsMgr's activities are workflows. Workflows are collections of modules that run in succession, passing DataItems between them. I have a series of blog posts related to a workflow tracer module I wrote that contain a detailed discussion on workflows. For more information before continuing on this topic, please refer to those posts.
Given that these workflows are central to OpsMgr and that modules are central to constructing workflows, two design goals of OpsMgr are readily apparent, even from the outside looking in:
- Each module should be as efficient as possible. While you can script anything, where possible, you should use the built-in modules for anything you can. This includes registry readers, WMI readers, etc. If need be, you can create your own Composite module and be as creative as you like; however, you should opt to use the various built-in module types whenever possible.
- Wherever possible, each module should be run as infrequently as possible. This is effectively known as cook down. OpsMgr has the basic intelligence necessary to minimize the executions of modules: whenever the effective configuration section of a given module is identical across multiple workflows, that module will only be run once. Its output will be used as input to the next module in line for any workflow in which the cooked down module is referenced. The is especially true for DataSource modules, as they have no input. Since they are at the beginning of the chain, they are prime candidates for cook down. I cannot speak directly for ConditionDetection or WriteAction modules as they have an input DataItem that would seem to nullify the ability to cook them down. I am similarly unsure of the exact behavior of ProbeAction modules with respect to cook down. They have a TriggerOnly attribute that, when true, obviates the need for an input DataItem, so I would suspect that they are candidates, but I have tested only DataSource modules.
In this post, I will develop an MP that uses a scripted DataSourceModuleType that cooks down to one call, but services many workflow instances (monitors). The example I will develop in this MP is a printer monitoring MP. We will endeavor to have a DataSource module that runs once, returns the status of all printers, and is then used in the individual unit monitor workflows for all states of all monitors of all printers. That is useful in and of itself, but the design pattern is absolutely critical, especially where the DataSource module performs operations that are far more expensive than obtaining printer status.
Deviating from the Defaults
Let's examine a script-based monitor created with the UI. There are several standard features of interest that limit the extent to which you can build monitors that cook down through the UI. While you can create a monitor that cooks down to a higher level of abstraction, you cannot have a single script be utilized in more than one monitor. For example, using the UI, you can construct a UnitMonitor that runs once per server and returns the status for all printers, but you will still need multiple executions for multiple monitors. In some cases, this still might be a waste of resources. By authoring the MP directly, we will be able to build our own DataSourceModuleType that will run once per server and service multiple monitors. This is the best design and also allows a single data source to drive monitors and performance collection rules, for example. After we build our own, I will return to the UI-generated monitor and discuss how to build a monitor that cooks down at least partially.
First, here are the screen shots of a simple UnitMonitor created with the UI. This monitor is not functional. I have intentionally left the script body alone and specified deliberate values for the various parts of the monitor. This will help us see where these values appear in the resultant XML in the management pack.
Figure 1: The script template.
Figure 2: The script parameters.
Figure 3: The unhealthy expression.
Figure 4: The healthy expression.
Here is the UnitMonitor that this generates, with some key highlights:
(Figure 5: The UI-generated UnitMonitor)
<UnitMonitor ID="UIGeneratedMonitor..." Accessibility="Public" Enabled="false"
Target="com.focus24.PrinterMonitoring.WindowsPrinter" ParentMonitorID="Health!System.Health.AvailabilityState"
Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.TimedScript.TwoStateMonitorType" ConfirmDelivery="false">
<Category>Custom</Category>
<OperationalStates>
<OperationalState ID="UIGeneratedOpStateId1..." MonitorTypeStateID="Error" HealthState="Warning"/>
<OperationalState ID="UIGeneratedOpStateId2..." MonitorTypeStateID="Success" HealthState="Success"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<SyncTime/>
<ScriptName>UIGeneratedMonitor.vbs</ScriptName>
<Arguments>UIGeneratedMonitor Parameters Are Inserted Here</Arguments>
<ScriptBody>
...omitted; it's the simple template...
</ScriptBody>
<TimeoutSeconds>60</TimeoutSeconds>
<ErrorExpression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">UnhealthyParameterName</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">UnhealthyParameterValue</Value>
</ValueExpression>
</SimpleExpression>
</ErrorExpression>
<SuccessExpression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">HealthyParameterName</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">HealthyParameterValue</Value>
</ValueExpression>
</SimpleExpression>
</SuccessExpression>
</Configuration>
</UnitMonitor>
The highlights show where the values specified using the UI appear in the MP. I also highlighted the UnitMonitorType that this utilizes. That particular UnitMonitorType is almost entirely a pass-through for the UnitMonitor itself. It defines two MonitorTypeState elements, utilizes a DataSource and two ConditionDetection modules, and utilizes those modules in the RegularDetection elements for each state. The reason I have dubbed it "almost entirely a pass-through" is that the Configuration for the DataSource and two ConditionDetection modules are simply the $Config/...$ expressions for each element. Ergo, everything required by the UnitMonitorType must be specified in the UnitMonitor. As you will see, this is not the most conducive arrangement for cook down.
For the UnitMonitor itself, note the following:
- For the any Configuration element, you have all of the usual replacement expressions available to you: $MPElement$, $Target$, etc. Following the normal pattern, you will need to pass something specifically identifiable about the $Target$ to the script, such that it can test the appropriate target. For instance, for a logical disk monitor, you need to pass the computer name and logical disk name in order to return something meaningful from a script that is used in a monitor that targets a logical disk.
- The script template that is inserted creates a property bag, shows examples of how to set properties in the property bag, and then returns the property bag to OpsMgr. This is also in line with the normal pattern of one invocation returning a property bag for one target.
- The underlying UnitMonitorType is built with ConditionDetection modules that uses an XPathQuery to test the value of a particular property (like @Name="State"), setting the monitor state according to the expressions you provide (2 for a two-state monitor, 3 for a three-state monitor). This, at least, always cooks down: the DataSource module is run once and its output is sent to both ConditionDetection modules.
If we are to create a script that can be cooked down, we will have to deviate from this in several key ways:
- The effective Configuration of the underlying DataSource module must have some level of abstraction. By definition, if we're hoping to cook a script down to 1 run for X targets, you cannot have the script rely on information specific to each target. Therefore, in our example, we will likely only want the script know the computer name.
- We will want to have several Configuration elements related to the scripted DataSource be removed from the individual workflow Configuration entirely. This is especially true for the script name and body. While I believe it would be possible for OpsMgr to cook down two completely different workflows based on whether their script name and body matched exactly, this would be an incredible waste of space in the management pack, especially if the script is long and complicated. You probably would have a hard time keeping it identical as well, especially if you base a large number of workflows on a single script.
Therefore, we will opt to author a single DataSourceModuleType that is used by several UnitMonitorTypes that are used by even more UnitMonitors. We could also use the DataSourceModuleType in Rules, etc., but we will keep this example to just UnitMonitors. Remember: the goal is to have the DataSource module run once and service every other workflow. Let's move on to building the MP.
Building the MP Shell
We will need several sections of our MP to be built before we can even consider a single monitor. I will not discuss these at length, but feel free to comment with any questions or observations. I will brain dump these for your reference when you review the MP:
- The manifest and references define the MP and set up its references and aliases.
- The class definitions declare a printer class that derives from Microsoft.Windows.LogicalDevice. Since this class is appropriate and already hosted, I thought it was the best candidate. The printers hosted by a server are not necessarily connected directly to the server or even in the same building, but this is still a good choice. I have selected some interesting properties to capture about the printer, but I'll rely on Microsoft.Windows.LogicalDevice and its hosts Microsoft.Windows.Computer to carry the declarative burden of key fields.
- No additional relationships are needed and the dependency monitor for the Windows Server model will already roll up our status accordingly.
- I have declared a DataSourceModuleType (.vbs) to discover the printers. You'll note that each discovery class instance includes key properties for the host. This is required by OpsMgr, but allows it to generate the hosting relationships for you.
- The DataSourceModuleType exposes its interval, timeout, and sync time as OverrideableParameters.
- That DataSourceModuleType is used in the Discovery. The Discovery targets Microsoft.Windows.Server.2003 computers. I was selective in choosing 2003 because it exposes improved WMI classes for printers than do the previous versions. The Discovery is also disabled by default. A group should be created for 2003 servers that are "interesting" print servers and an override targeted to that group should be created to enable the Discovery. This is what I feel is proper form for any management pack. For testing purposes in a lab, you could just enable this rule.
- In this case, you'll have to put the new group and overrides in this MP since it will not be sealed. For a sealed MP, the group and override would need to be in some separate unsealed MP. This is one of the main reasons to seal an MP, even if the signing and security aspects are not required: it allows updates to the core MP without affecting operational changes, such as overrides, groups, and group membership.
- In PresentationTypes I declare an Image resource for the class I expose. This is an 80x80 diagram PNG image. I like the impact of having custom images for the diagram view and distributed applications.
- In Presentation, I declare several Views: a state view, a diagram view, and a folder to put them in.
- There are also, of course, DisplayStrings for everything.
Building the Unified Data Source
The unified data source will be a rather simple script, ironically. If we remember that the goal of this design pattern is to have the script do expensive work once, you will find that they will commonly be simple. This will be true if the expense is in retrieving the data from the source (e.g. from a very slow or expensive system or via a very slow or expensive connection). The complexity of the script will increase substantially if the cost is in calculating the data from the source. This would be true if the data were raw and needed to be processed or transformed (e.g. statistical calculations, hashes, cryptographic requirements, etc.) Here, then, is the declaration of DataSourceModuleType for our example printer monitoring MP:
(Figure 6: The cook-down friendly DataSourceModuleType)
<DataSourceModuleType ID="com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM" Accessibility="Public">
<Configuration>
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="IntervalSeconds" type="xsd:int" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SyncTime" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="TimeoutSeconds" type="xsd:integer" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="PrincipalName" type="xsd:string" />
</Configuration>
<OverrideableParameters>
<OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
<OverrideableParameter ID="SyncTime" Selector="$Config/SyncTime$" ParameterType="string" />
<OverrideableParameter ID="TimeoutSeconds" Selector="$Config/TimeoutSeconds$" ParameterType="int" />
</OverrideableParameters>
<ModuleImplementation>
<Composite>
<MemberModules>
<DataSource ID="DS1" TypeID="Windows!Microsoft.Windows.TimedScript.PropertyBagProvider">
<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
<SyncTime>$Config/SyncTime$</SyncTime>
<ScriptName>com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM.vbs</ScriptName>
<Arguments>$Config/PrincipalName$</Arguments>
<ScriptBody>
Option Explicit
Dim oWbemServices, oPrinter
Dim oAPI, oBag
Set oAPI = WScript.CreateObject("MOM.ScriptAPI")
oAPI.LogScriptEvent "com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM.vbs", 1002, 0, "Printer Monitor Starting"
Set oWbemServices = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & WScript.Arguments(0) & "\root\cimv2")
For Each oPrinter in oWbemServices.ExecQuery("SELECT Name,DetectedErrorState,PrinterStatus,Workoffline FROM Win32_Printer WHERE Network=false")
Set oBag = oAPI.CreatePropertyBag()
oBag.AddValue "DeviceID", "Printer:" & oPrinter.Name & ""
oBag.AddValue "DetectedErrorState", oPrinter.DetectedErrorState & ""
oBag.AddValue "PrinterStatus", oPrinter.PrinterStatus & ""
oBag.AddValue "Workoffline", oPrinter.Workoffline & ""
oAPI.AddItem oBag
Next
oAPI.LogScriptEvent "com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM.vbs", 1003, 0, "Printer Monitor Ending"
oAPI.ReturnItems()
</ScriptBody>
<SecureInput />
<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
<EventPolicy />
</DataSource>
</MemberModules>
<Composition>
<Node ID="DS1" />
</Composition>
</Composite>
</ModuleImplementation>
<OutputType>System!System.PropertyBagData</OutputType>
</DataSourceModuleType>
Highlighted in this declaration are the following characteristics of this DataSourceModuleType that make it ready for cook down:
- Of the possible Configuration elements of the underlying Microsoft.Windows.TimedScript.PropertyBagProvider, the only ones that are exposed and subject to change are IntervalSeconds, SyncTime, and TimeoutSeconds. As we will see, these can still cause issues because they can be overridden; however, this is a risk that must be endured since a module that cannot be configured through overrides is bound to be problematic in any environment.
- We have added another Configuration element, PrincipalName. This is passed as the sole item in the Arguments element. Having this be a blatantly named Configuration element should ensure that the script is implemented as we require it to be to take advantage of the cook down we're designing into it.
- For the script itself, note that we are iterating through all printers and using the CreatePropertyBag method of the scripting API for each instance.
- Note that each property bag itself contains values that you can already predict will be useful to multiple Monitors, Rules, etc. Again, we are continuing to build on the concept that while we are accessing the source, we should gather as much information as we can to minimize the expense.
- The per-instance property bag is added to the output using the scripting API's AddItem method, which is not a well-known method, since it does not appear in the simple template the UI provides.
- Since there have been multiple property bag items added to the output, the script ends with a call to the ReturnItems method, which takes no parameters, rather than the usual Return with the property bag as the parameter.
- Finally, note the LogScriptEvent method calls, which log events 1002 and 1003. These will be important when we confirm the number of invocations of our script.
- These three characteristics of our script deviate from the UI-provided template and form the basis for a unified DataSourceModuleType. The point here is that the DataSource runs once and returns enough information to satisfy multiple workflows for multiple targets.
The XML representation of this script's output also deviates from the traditional DataItem by wrapping multiple DataItems in a Collection element. For example:
(Figure 7: An example Collection DataItem)
<Collection>
<DataItem ...>
<Property Name="DeviceID" ...>Printer:NameOfPrinter1</Property>
<Property Name="DetectedErrorState" ...>...</Property>
</DataItem>
<DataItem ...>
<Property Name="DeviceID" ...>Printer:NameOfPrinter2</Property>
<Property Name="DetectedErrorState" ...>...</Property>
</DataItem>
</Collection>
The behavior that makes this useful is that OpsMgr will now send each DataItem in the Collection to the next module in any workflow that utilizes this DataSource for any target, as long as the effective Configuration is the same. For us, this means that subsequent modules for every workflow for every target on a single machine will be serviced by a single run of this DataSource.
Our next order of business will be to utilize this module in some UnitMonitorTypes.
Building the Unit Monitor Types
The design pattern for every workflow that will use our DataSourceModuleType is essentially the same: specify our module as the start and then have its output fed to ConditionDetection modules that will filter down the output to just what we need for the particular workflow and target. We will see concrete examples of UnitMonitorTypes, but this pattern applies to any type of workflow that can consume the output. We'll talk through a few of these other examples after we complete the sample MP. These examples are long, based on the verbosity of the UnitMonitorType schema, but worth including in their entirety. Here, then, are two UnitMonitorTypes that utilize the unified DataSourceModuleType, with some highlights we will discuss below:
(Figure 8: Two UnitMonitorTypes based on our DataSourceModuleType)
<UnitMonitorType ID="com.focus24.PrinterMonitoring.WindowsPrinter.ErrorStateUMT" Accessibility="Internal">
<MonitorTypeStates>
<MonitorTypeState ID="StateIsHealthy" NoDetection="false"/>
<MonitorTypeState ID="StateIsWarning" NoDetection="false"/>
<MonitorTypeState ID="StateIsCritical" NoDetection="false"/>
</MonitorTypeStates>
<Configuration>
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="IntervalSeconds" type="xsd:int" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SyncTime" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="TimeoutSeconds" type="xsd:integer" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="PrincipalName" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="DeviceID" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SpecificState" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="Warning0Critical1" type="xsd:string" />
</Configuration>
<OverrideableParameters>
<OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
<OverrideableParameter ID="SyncTime" Selector="$Config/SyncTime$" ParameterType="string" />
<OverrideableParameter ID="TimeoutSeconds" Selector="$Config/TimeoutSeconds$" ParameterType="int" />
<OverrideableParameter ID="Warning0Critical1" Selector="$Config/Warning0Critical1$" ParameterType="string" />
</OverrideableParameters>
<MonitorImplementation>
<MemberModules>
<DataSource ID="DS1" TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM">
<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
<SyncTime>$Config/SyncTime$</SyncTime>
<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
<PrincipalName>$Config/PrincipalName$</PrincipalName>
</DataSource>
<ConditionDetection ID="FilterForStateIsHealthy" TypeID="System!System.ExpressionFilter">
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/DeviceID$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DetectedErrorState']</XPathQuery>
</ValueExpression>
<Operator>NotEqual</Operator>
<ValueExpression>
<Value>$Config/SpecificState$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</ConditionDetection>
<ConditionDetection ID="FilterForStateIsWarning" TypeID="System!System.ExpressionFilter">
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/DeviceID$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DetectedErrorState']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/SpecificState$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<Value>$Config/Warning0Critical1$</Value>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>0</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</ConditionDetection>
<ConditionDetection ID="FilterForStateIsCritical" TypeID="System!System.ExpressionFilter">
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/DeviceID$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DetectedErrorState']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/SpecificState$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<Value>$Config/Warning0Critical1$</Value>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>1</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</ConditionDetection>
</MemberModules>
<RegularDetections>
<RegularDetection MonitorTypeStateID="StateIsHealthy">
<Node ID="FilterForStateIsHealthy">
<Node ID="DS1"/>
</Node>
</RegularDetection>
<RegularDetection MonitorTypeStateID="StateIsWarning">
<Node ID="FilterForStateIsWarning">
<Node ID="DS1"/>
</Node>
</RegularDetection>
<RegularDetection MonitorTypeStateID="StateIsCritical">
<Node ID="FilterForStateIsCritical">
<Node ID="DS1"/>
</Node>
</RegularDetection>
</RegularDetections>
</MonitorImplementation>
</UnitMonitorType>
<UnitMonitorType ID="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOfflineUMT" Accessibility="Internal">
<MonitorTypeStates>
<MonitorTypeState ID="StateIsHealthy" NoDetection="false"/>
<MonitorTypeState ID="StateIsWarning" NoDetection="false"/>
<MonitorTypeState ID="StateIsCritical" NoDetection="false"/>
</MonitorTypeStates>
<Configuration>
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="IntervalSeconds" type="xsd:int" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SyncTime" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="TimeoutSeconds" type="xsd:integer" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="PrincipalName" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="DeviceID" type="xsd:string" />
<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="Warning0Critical1" type="xsd:string" />
</Configuration>
<OverrideableParameters>
<OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
<OverrideableParameter ID="SyncTime" Selector="$Config/SyncTime$" ParameterType="string" />
<OverrideableParameter ID="TimeoutSeconds" Selector="$Config/TimeoutSeconds$" ParameterType="int" />
<OverrideableParameter ID="Warning0Critical1" Selector="$Config/Warning0Critical1$" ParameterType="string" />
</OverrideableParameters>
<MonitorImplementation>
<MemberModules>
<DataSource ID="DS1" TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM">
<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
<SyncTime>$Config/SyncTime$</SyncTime>
<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
<PrincipalName>$Config/PrincipalName$</PrincipalName>
</DataSource>
<ConditionDetection ID="FilterForStateIsHealthy" TypeID="System!System.ExpressionFilter">
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/DeviceID$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='Workoffline']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>False</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</ConditionDetection>
<ConditionDetection ID="FilterForStateIsWarning" TypeID="System!System.ExpressionFilter">
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/DeviceID$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='Workoffline']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>True</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<Value>$Config/Warning0Critical1$</Value>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>0</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</ConditionDetection>
<ConditionDetection ID="FilterForStateIsCritical" TypeID="System!System.ExpressionFilter">
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>$Config/DeviceID$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery>Property[@Name='Workoffline']</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>True</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<Value>$Config/Warning0Critical1$</Value>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value>1</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</ConditionDetection>
</MemberModules>
<RegularDetections>
<RegularDetection MonitorTypeStateID="StateIsHealthy">
<Node ID="FilterForStateIsHealthy">
<Node ID="DS1"/>
</Node>
</RegularDetection>
<RegularDetection MonitorTypeStateID="StateIsWarning">
<Node ID="FilterForStateIsWarning">
<Node ID="DS1"/>
</Node>
</RegularDetection>
<RegularDetection MonitorTypeStateID="StateIsCritical">
<Node ID="FilterForStateIsCritical">
<Node ID="DS1"/>
</Node>
</RegularDetection>
</RegularDetections>
</MonitorImplementation>
</UnitMonitorType>
Here are the highlights:
- First, I've added three more specific elements to the Configuration element: DeviceID, SpecificState, and Warning0Critical1. These will be used to identify the target, the specific state to be detected, and the criticality of the condition, respectively. Note that none of these are passed to the DataSource, so it is insulated from any variations related to the target or the particular monitor. Again, this points to OpsMgr's ability to use a single run of the DataSource module for multiple targets and workflows.
- I've also made Warning0Critical1 an OverrideableParameter. Using a three state monitor with this parameter gives us the ability to control the criticality of the actual monitor, not just the alert. The monitor's criticality is usually fixed.
- The ConditionDetection modules for the different states are much more complicated than before. Now, we have two or three expressions that must be And'ed together.
- The first expression matches only DataItems related to this DeviceID (Target).
- The second expression matches whether or not we are in the SpecificState (which is designed here as Unhealthy). Note the difference between the Healthy ConditionDetection module and the Warning/Critical ConditionDetection modules.
- The third expression (only on the Warning and Critical ConditionDetection modules) matches our Warning0Critical1 parameter and matches only for the appropriate criticality.
- These design patterns follow for the second UnitMonitorType, though there is no need for a SpecificState Configuration element because it is testing a boolean condition.
- Note that we should not use $Target$ references here. Although they seem like they would fit, they in fact do not and their behavior is indeterminate, based on my observations. We will use $Config$ references here and fill them with $Target$ references when we utilize the UnitMonitorType in a UnitMonitor.
The full picture of how this design pattern comes together should be starting to form at this point. The only step left is to craft a few UnitMonitors that are based on these UnitMonitorType.
Building the Unit Monitors
I can keep these examples simple. Since there are a dozen or so error states for a printer, we could create that many UnitMonitors. We could also re-visit our DataSource script and normalize the values somewhat (i.e. return a few discrete values for which we want monitors and return anything else as a single value we could roll into an "Other Problem" UnitMonitor). For our purposes, I will create three: two based on the first UnitMonitorType and one based on the second. This will be another large section of management pack XML, but well worth the review.
(Figure 9: Three UnitMonitors based on our two UnitMonitorTypes)
<UnitMonitor ID="com.focus24.PrinterMonitoring.WindowsPrinter.Error.Monitor"
Accessibility="Public" Enabled="true"
Target="com.focus24.PrinterMonitoring.WindowsPrinter"
ParentMonitorID="Health!System.Health.AvailabilityState"
Remotable="true" Priority="Normal"
TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.ErrorStateUMT"
ConfirmDelivery="false">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="com.focus24.PrinterMonitoring.WindowsPrinter.Error.Alert">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="ErrorCritical" MonitorTypeStateID="StateIsCritical" HealthState="Error"/>
<OperationalState ID="ErrorWarning" MonitorTypeStateID="StateIsWarning" HealthState="Warning"/>
<OperationalState ID="ErrorHealthy" MonitorTypeStateID="StateIsHealthy" HealthState="Success"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime />
<TimeoutSeconds>150</TimeoutSeconds>
<PrincipalName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PrincipalName>
<DeviceID>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</DeviceID>
<SpecificState>1</SpecificState>
<Warning0Critical1>0</Warning0Critical1>
</Configuration>
</UnitMonitor>
<UnitMonitor ID="com.focus24.PrinterMonitoring.WindowsPrinter.Offline.Monitor"
Accessibility="Public" Enabled="true"
Target="com.focus24.PrinterMonitoring.WindowsPrinter"
ParentMonitorID="Health!System.Health.AvailabilityState"
Remotable="true" Priority="Normal"
TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.ErrorStateUMT"
ConfirmDelivery="false">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="com.focus24.PrinterMonitoring.WindowsPrinter.Offline.Alert">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="OfflineCritical" MonitorTypeStateID="StateIsCritical" HealthState="Error"/>
<OperationalState ID="OfflineWarning" MonitorTypeStateID="StateIsWarning" HealthState="Warning"/>
<OperationalState ID="OfflineHealthy" MonitorTypeStateID="StateIsHealthy" HealthState="Success"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime />
<TimeoutSeconds>150</TimeoutSeconds>
<PrincipalName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PrincipalName>
<DeviceID>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</DeviceID>
<SpecificState>9</SpecificState>
<Warning0Critical1>1</Warning0Critical1>
</Configuration>
</UnitMonitor>
<UnitMonitor ID="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOffline.Monitor"
Accessibility="Public" Enabled="true"
Target="com.focus24.PrinterMonitoring.WindowsPrinter"
ParentMonitorID="Health!System.Health.AvailabilityState"
Remotable="true" Priority="Normal"
TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOfflineUMT"
ConfirmDelivery="false">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOffline.Alert">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="WorkingOfflineCritical" MonitorTypeStateID="StateIsCritical" HealthState="Error"/>
<OperationalState ID="WorkingOfflineWarning" MonitorTypeStateID="StateIsWarning" HealthState="Warning"/>
<OperationalState ID="WorkingOfflineHealthy" MonitorTypeStateID="StateIsHealthy" HealthState="Success"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime />
<TimeoutSeconds>150</TimeoutSeconds>
<PrincipalName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PrincipalName>
<DeviceID>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</DeviceID>
<Warning0Critical1>1</Warning0Critical1>
</Configuration>
</UnitMonitor>
Here, the highlights are just to call your attention to how the UnitMonitorType is utilized by two different UnitMonitors. They also differ in their use of the Warning0Critical1 element, which is also reflected in the default AlertOnState and AlertSeverity elements. If you follow the values specified here back through the UnitMonitorType and then back to the DataSourceModuleType, you will see where the $Target/Host/.../PrincipalName$ propagates back to the DataSourceModuleType while the $Target/.../DeviceID$ only propagates as far back as the ConditionDetection modules in the UnitMonitorType. The same is true for the Warning0Critical1 Configuration element and, for the first two, the SpecificState element. Hence, the DataSourceModuleType will cook down to one run for all targets of all three monitors.
This essentially completes our management pack. Now, let's see what activity this generates in a test environment. The test environment represented here consists of two machines SCOM-WIN2K3, an agent-managed machine with three printers installed, and SCOM-SERVER, the RMS, also with three printers installed. For demonstration purposes, I've set one printer on each machine to Work Offline. I've also set one printer on each machine to use a network port at 127.0.0.1, and then generated a test page, which put the printer into an error status. Let's see that configuration first:
(Figures 10 and 11: The Printer Configuration from Each Test Server)
Next, let's see what our LogScriptEvent method call recorded on both machines (events 1002 and 1003):
(Figures 12 and 13: Event Log Excerpts from Each Test Server)
So far, so good. Both show the new configuration becoming active and then one and only one invocation of our DataSource module. Let's see what the state view says:
(Figure 14: The State View from OpsMgr)
Also good. Two printers (targets) from two different servers are showing two different severities. We can confirm that this one invocation of our script has fed multiple workflows by reviewing the alerts generated:
(Figure 15: The Alerts View from OpsMgr)
This is exactly what we expected to see, which confirms the behavior we designed for. Finally, just because we bothered to include a diagram icon for this class:
(Figure 16: The Diagram View from OpsMgr)
Other Workflow Types
As promised, we can easily talk through how this would be implemented for different types of workflows:
- For a performance collection Rule:
- The unified DataSource module (or another similarly designed) produces a Collection.
- The Collection is consumed by a ConditionDetection module to select only the DataItems related to performance metrics for a specific target.
- Another ConditionDetection module receives that particular DataItem and maps it to a System.Performance.Data DataItem.
- One or more WriteActions store the DataItem in the DB/DW.
- Essentially, this is a re-implementation of the built-in scripted performance composite module that ships with OpsMgr, utilizing a unified DataSource module and an additional ConditionDetection module for filtering just before the ConditionDetection module that does the mapping.
- For an event collection Rule:
- The unified DataSource module (or another similarly designed) produces a Collection.
- The Collection is consumed by a ConditionDetection module to select only the DataItems related to performance metrics for a specific target.
- Another ConditionDetection module receives that particular DataItem and maps it to a System.Event.Data DataItem.
- The DataItem is consumed by OpsMgr.
- This can similarly be thought of as a re-implementation of the built-in scripted event provider, with similar changes as noted for the performance example above.
Revisiting the UI-Generated Monitor
Also, as promised, let's try to apply what we've seen here to the UI-generated monitor and discuss how cook down might be implemented at least for individual workflows and possibly for multiple workflows. To realize the same type of cook down in a UI-generated script:
- Only specify the highest level of abstraction in the arguments. Do not include arguments based on the $Target$. Only use arguments as granular as $Target/Host$, for example.
- Do not use the pattern given by the example script (i.e. one property bag returned with the Return method). Instead, access the source and return multiple property bags using multiple CreatePropertyBag, the IncludeItem, and the ReturnItems methods, respectively.
- In each property bag, instead of including just a property for the state of a single target, you must include properties to identify the target (such as the DeviceID in our example). This is required since you will be returning multiple property bags. They are not very valuable if they cannot be differentiated!
- For the healthy, warning, and/or critical expressions, use an And that filters based on the $Target$ and the state of that target.
- That should make the underlying DataSource eligible for cook down.
You can even create multiple monitors that will all cook down to a single run using the UI, but this, as I mentioned earlier, is quite a hack:
- Follow all of the guidelines above.
- In your script, include in the property bag multiple states or measurements for each target, identifying them by property name. Alternatively, you could include multiple property bags per target, but at least one property would need to identify the state or measurement to which each property bag applies.
- Add another expression under the And expression to further filter the collection of property bags to just the particular state or measurement each particular monitor or rule needs.
- Duplicate the arguments, timeout, interval, and script body exactly in each monitor or rule you create. This is perhaps the single biggest reason I feel that this method is a complete hack.
- In theory, this will allow OpsMgr to cook down all of the UI-created workflows together.
Limitations
The one caveat with the unified model is that you must be very careful when dealing with state that represents a "present/missing" situation. If the presence of something generates an unhealthy condition, that is easy enough to include in the Collection. The issue is that if, on the next run, that presence is now missing, you must be able to include an appropriate DataItem in the Collection. In some cases, this might be difficult, especially since the unified DataSource does not have access to the inventory of targets or their current state.
For a more concrete example, consider the fictitious example of a DataSource that examines the conditions of long-running jobs. There would need to be a Discovery that creates the instances of the jobs. Then, any job that was flagged by the DataSource module would have a corresponding DataItem in the Collection, which would be examined by the appropriate ConditionDetection modules in the UnitMonitorType and would generate the appropriate state change. If, on a subsequent run, the job no longer existed, the monitor would never change back to Healthy. This is because we're assuming that the DataSource would have trouble reporting the status of a job that no longer exists. This will cause the issue to linger in OpsMgr, especially if the Discovery run interval is much greater than the monitor (as is usually the case).
Another example that would not be fixed even by the next Discovery run would be if a DataSource module examined the state of hundreds of persistent connections to a service. Using another table of open transactions, the DataSource flagged any transaction that was older than a particular number of minutes. The DataSource would ostensibly return a Collection of all connections that have at least one old transaction. This would similarly be processed and set the state accordingly. On subsequent runs, the DataSource would not have the context of what it had set as unhealthy before. In this case, to compensate, you would probably want to ensure that every connection was represented in the Collection returned, defaulting all to healthy unless otherwise determined to be unhealthy. Depending on the exact incarnation of this issue, that may or may not be a reasonable thing to do. If it is not, you would likely need to have your DataSource be stateful somehow, by means external to OpsMgr, such that it can return to healthy anything it had previously set to unhealthy.
This is a general limitation of the design pattern represented here, but should be easily identifiable. Such cases will require additional work to balance the need for granular invocations with the efficiencies of cook down.
DataItem Overload
Another side-effect of using a cooked down design model that should be considered is the limitations imposed by the dropping algorithm of OpsMgr. When there is too much data outstanding in a rule, OpsMgr will drop DataItems until the rule is no longer overloaded. The maximum number of DataItems that may be outstanding in a rule is 128. The overall number of outstanding DataItems for all rules active on a HealthService is 5,120. These values are not configurable. This behavior is by design, because most often, the presence of that many outstanding DataItems indicates a problem with the MonitoringHost.exe. In the case of cooked down designs, this can present a problem. Returning to our example, consider an environment that has a large number of printers installed. By large, I mean a number much greater than 128. When the cooked down script returns its Collection, there will be one DataItem per printer contained in it. Those DataItems will fan out to the various receiving modules in the workflow. Depending on the processing speed of the system on which the workflows are running, it is quite possible to exceed the 128 limit and even the 5,120 limit (depending on how many distinct iterations of this design pattern are present). There is currently no work-around for this, except to watch for these issues (Event ID 4506 in the Operations Manager log) and further balance the design between fewer distinct invocations (producing larger Collections) and smaller Collections (produced by more distinct invocations).
The Issues with Overrides
A very simple reality can have very drastic consequences when we consider designing an MP according to this pattern. Overrides, by definition, change the effective configuration of a workflow and can be targeted very granularly. This can cause the unified (and expensive and monolithic) script to be invoked multiple times. Some of those invocations could assemble the entire Collection only to be consumed by a single target. This is most definitely a balancing act. Be aware of the ramifications of overrides. Document them clearly in your MP Guide. Where possible, try to compensate with good design.
Another issue related to overrides that arises with cook down is the granularity with which you can disable the actions a workflow is taking. If there are individual workflows for each and every instance of the target, you can prevent a specific workflow from running at all for a particular target by disabling it. This includes the DataSource invocation against that particular target. When you implement a cooked down version of the workflow, you can disable the specific workflow that evaluates the results of the DataSource, but the DataSource will retrieve the source data from that target regardless. This is because by definition, the DataSource is no longer target-specific, but rather assembles the data for all targets within a particular (larger) scope. The only scenario where this could be a problem is where the actual collection of some data point from a target is undesirable in some cases. For example, if you had a printer whose driver caused a significant delay and system load whenever it was interrogated by WMI, you might wish to override the workflow against that particular printer. Although you can override the workflow that evaluates the data returned from the DataSource for that particular target, the data point will still be retrieved. As you can probably imagine, the design pattern that allows for cook down and for granular exclusion could become quite complex. This is another trade-off, so be aware of the ramifications of this in your design.
Conclusion
Designing quality management packs is a fairly broad mix of requirements. Modeling your class hierarchy to represent what you need to manage and still fit into the OpsMgr paradigm is the first step. Planning to utilize everything OpsMgr has to offer is another important step (e.g. Distributed Applications, Reports, etc.). Intelligently instrumenting whatever it is you are managing such that OpsMgr reports relevant, timely, and "right-sized" amounts of information is usually the step on which we concentrate the most. Perhaps the most important step, though, is to design your MP to live harmoniously alongside other MPs in environments where the scope of what your MP monitors may be a small part of the larger environment. To that end, I offer this treatment of cook down for the general good.
Comments, as always are welcomed and appreciated.
VJD/<><
You can download the whole MP here.