Workflow Cook-Down in Operations Manager 2007

Abstract

I was recently working with Vlad Joanovic, the Microsoft Program Manager for MP development on a new MP that has the potential of having a single health service acting as proxy for thousands of devices.  The subject of workflow cook-down became a high priority, for obvious reasons.  The potential of having at least one workflow (and possibly tens) run for each device was certainly something that needed to be avoided.  After implementing a series of DataSourceModuleTypes, UnitMonitorTypes, Rules, and so forth that effectively fulfilled this requirement, we reflected for a moment on how there isn't really an end-to-end treatment of this subject out there at the moment.  This is my attempt at rectifying that.

Background

At the heart of almost all OpsMgr's activities are workflows.  Workflows are collections of modules that run in succession, passing DataItems between them.  I have a series of blog posts related to a workflow tracer module I wrote that contain a detailed discussion on workflows.  For more information before continuing on this topic, please refer to those posts.

Given that these workflows are central to OpsMgr and that modules are central to constructing workflows, two design goals of OpsMgr are readily apparent, even from the outside looking in:

  1. Each module should be as efficient as possible.  While you can script anything, where possible, you should use the built-in modules for anything you can.  This includes registry readers, WMI readers, etc.  If need be, you can create your own Composite module and be as creative as you like; however, you should opt to use the various built-in module types whenever possible.
  2. Wherever possible, each module should be run as infrequently as possible.  This is effectively known as cook down.  OpsMgr has the basic intelligence necessary to minimize the executions of modules: whenever the effective configuration section of a given module is identical across multiple workflows, that module will only be run once.  Its output will be used as input to the next module in line for any workflow in which the cooked down module is referenced.  The is especially true for DataSource modules, as they have no input.  Since they are at the beginning of the chain, they are prime candidates for cook down.  I cannot speak directly for ConditionDetection or WriteAction modules as they have an input DataItem that would seem to nullify the ability to cook them down.  I am similarly unsure of the exact behavior of ProbeAction modules with respect to cook down.  They have a TriggerOnly attribute that, when true, obviates the need for an input DataItem, so I would suspect that they are candidates, but I have tested only DataSource modules.

In this post, I will develop an MP that uses a scripted DataSourceModuleType that cooks down to one call, but services many workflow instances (monitors).  The example I will develop in this MP is a printer monitoring MP.  We will endeavor to have a DataSource module that runs once, returns the status of all printers, and is then used in the individual unit monitor workflows for all states of all monitors of all printers.  That is useful in and of itself, but the design pattern is absolutely critical, especially where the DataSource module performs operations that are far more expensive than obtaining printer status.

Deviating from the Defaults

Let's examine a script-based monitor created with the UI.  There are several standard features of interest that limit the extent to which you can build monitors that cook down through the UI.  While you can create a monitor that cooks down to a higher level of abstraction, you cannot have a single script be utilized in more than one monitor.  For example, using the UI, you can construct a UnitMonitor that runs once per server and returns the status for all printers, but you will still need multiple executions for multiple monitors.  In some cases, this still might be a waste of resources.  By authoring the MP directly, we will be able to build our own DataSourceModuleType that will run once per server and service multiple monitors.  This is the best design and also allows a single data source to drive monitors and performance collection rules, for example.  After we build our own, I will return to the UI-generated monitor and discuss how to build a monitor that cooks down at least partially.

First, here are the screen shots of a simple UnitMonitor created with the UI.  This monitor is not functional.  I have intentionally left the script body alone and specified deliberate values for the various parts of the monitor.  This will help us see where these values appear in the resultant XML in the management pack.

Figure 1: The script template.

UIMonitor1 

Figure 2: The script parameters.

UIMonitor2

Figure 3: The unhealthy expression.

UIMonitor3

Figure 4: The healthy expression.

UIMonitor4

Here is the UnitMonitor that this generates, with some key highlights:

(Figure 5: The UI-generated UnitMonitor)

<UnitMonitor ID="UIGeneratedMonitor..." Accessibility="Public" Enabled="false"

Target="com.focus24.PrinterMonitoring.WindowsPrinter" ParentMonitorID="Health!System.Health.AvailabilityState"

Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.TimedScript.TwoStateMonitorType" ConfirmDelivery="false">

	<Category>Custom</Category>
	<OperationalStates>
		<OperationalState ID="UIGeneratedOpStateId1..." MonitorTypeStateID="Error" HealthState="Warning"/>
		<OperationalState ID="UIGeneratedOpStateId2..." MonitorTypeStateID="Success" HealthState="Success"/>
	</OperationalStates>
	<Configuration>
		<IntervalSeconds>900</IntervalSeconds>
		<SyncTime/>
		<ScriptName>UIGeneratedMonitor.vbs</ScriptName>
		<Arguments>UIGeneratedMonitor Parameters Are Inserted Here</Arguments>
		<ScriptBody>
			...omitted; it's the simple template...
		</ScriptBody>
		<TimeoutSeconds>60</TimeoutSeconds>
		<ErrorExpression>
			<SimpleExpression>
				<ValueExpression>
					<XPathQuery Type="String">UnhealthyParameterName</XPathQuery>
				</ValueExpression>
				<Operator>Equal</Operator>
				<ValueExpression>
					<Value Type="String">UnhealthyParameterValue</Value>
				</ValueExpression>
			</SimpleExpression>
		</ErrorExpression>
		<SuccessExpression>
			<SimpleExpression>
				<ValueExpression>
					<XPathQuery Type="String">HealthyParameterName</XPathQuery>
				</ValueExpression>
				<Operator>Equal</Operator>
				<ValueExpression>
					<Value Type="String">HealthyParameterValue</Value>
				</ValueExpression>
			</SimpleExpression>
		</SuccessExpression>
	</Configuration>
</UnitMonitor>

The highlights show where the values specified using the UI appear in the MP.  I also highlighted the UnitMonitorType that this utilizes.  That particular UnitMonitorType is almost entirely a pass-through for the UnitMonitor itself.  It defines two MonitorTypeState elements, utilizes a DataSource and two ConditionDetection modules, and utilizes those modules in the RegularDetection elements for each state.  The reason I have dubbed it "almost entirely a pass-through" is that the Configuration for the DataSource and two ConditionDetection modules are simply the $Config/...$ expressions for each element.  Ergo, everything required by the UnitMonitorType must be specified in the UnitMonitor.  As you will see, this is not the most conducive arrangement for cook down.

For the UnitMonitor itself, note the following:

  • For the any Configuration element, you have all of the usual replacement expressions available to you: $MPElement$, $Target$, etc.  Following the normal pattern, you will need to pass something specifically identifiable about the $Target$ to the script, such that it can test the appropriate target.  For instance, for a logical disk monitor, you need to pass the computer name and logical disk name in order to return something meaningful from a script that is used in a monitor that targets a logical disk.
  • The script template that is inserted creates a property bag, shows examples of how to set properties in the property bag, and then returns the property bag to OpsMgr.  This is also in line with the normal pattern of one invocation returning a property bag for one target.
  • The underlying UnitMonitorType is built with ConditionDetection modules that uses an XPathQuery to test the value of a particular property (like @Name="State"), setting the monitor state according to the expressions you provide (2 for a two-state monitor, 3 for a three-state monitor).  This, at least, always cooks down: the DataSource module is run once and its output is sent to both ConditionDetection modules.

If we are to create a script that can be cooked down, we will have to deviate from this in several key ways:

  • The effective Configuration of the underlying DataSource module must have some level of abstraction.  By definition, if we're hoping to cook a script down to 1 run for X targets, you cannot have the script rely on information specific to each target.  Therefore, in our example, we will likely only want the script know the computer name.
  • We will want to have several Configuration elements related to the scripted DataSource be removed from the individual workflow Configuration entirely.  This is especially true for the script name and body.  While I believe it would be possible for OpsMgr to cook down two completely different workflows based on whether their script name and body matched exactly, this would be an incredible waste of space in the management pack, especially if the script is long and complicated.  You probably would have a hard time keeping it identical as well, especially if you base a large number of workflows on a single script.

Therefore, we will opt to author a single DataSourceModuleType that is used by several UnitMonitorTypes that are used by even more UnitMonitors.  We could also use the DataSourceModuleType in Rules, etc., but we will keep this example to just UnitMonitors.  Remember: the goal is to have the DataSource module run once and service every other workflow.  Let's move on to building the MP.

Building the MP Shell

We will need several sections of our MP to be built before we can even consider a single monitor.  I will not discuss these at length, but feel free to comment with any questions or observations.  I will brain dump these for your reference when you review the MP:

  • The manifest and references define the MP and set up its references and aliases.
  • The class definitions declare a printer class that derives from Microsoft.Windows.LogicalDevice.  Since this class is appropriate and already hosted, I thought it was the best candidate.  The printers hosted by a server are not necessarily connected directly to the server or even in the same building, but this is still a good choice.  I have selected some interesting properties to capture about the printer, but I'll rely on Microsoft.Windows.LogicalDevice and its hosts Microsoft.Windows.Computer to carry the declarative burden of key fields.
  • No additional relationships are needed and the dependency monitor for the Windows Server model will already roll up our status accordingly.
  • I have declared a DataSourceModuleType (.vbs) to discover the printers.  You'll note that each discovery class instance includes key properties for the host.  This is required by OpsMgr, but allows it to generate the hosting relationships for you.
  • The DataSourceModuleType exposes its interval, timeout, and sync time as OverrideableParameters.
  • That DataSourceModuleType is used in the Discovery.  The Discovery targets Microsoft.Windows.Server.2003 computers.  I was selective in choosing 2003 because it exposes improved WMI classes for printers than do the previous versions.  The Discovery is also disabled by default.  A group should be created for 2003 servers that are "interesting" print servers and an override targeted to that group should be created to enable the Discovery.  This is what I feel is proper form for any management pack.  For testing purposes in a lab, you could just enable this rule.
  • In this case, you'll have to put the new group and overrides in this MP since it will not be sealed.  For a sealed MP, the group and override would need to be in some separate unsealed MP.  This is one of the main reasons to seal an MP, even if the signing and security aspects are not required: it allows updates to the core MP without affecting operational changes, such as overrides, groups, and group membership.
  • In PresentationTypes I declare an Image resource for the class I expose.  This is an 80x80 diagram PNG image.  I like the impact of having custom images for the diagram view and distributed applications.
  • In Presentation, I declare several Views: a state view, a diagram view, and a folder to put them in.
  • There are also, of course, DisplayStrings for everything.

Building the Unified Data Source

The unified data source will be a rather simple script, ironically.  If we remember that the goal of this design pattern is to have the script do expensive work once, you will find that they will commonly be simple.  This will be true if the expense is in retrieving the data from the source (e.g. from a very slow or expensive system or via a very slow or expensive connection).  The complexity of the script will increase substantially if the cost is in calculating the data from the source.  This would be true if the data were raw and needed to be processed or transformed (e.g. statistical calculations, hashes, cryptographic requirements, etc.)  Here, then, is the declaration of DataSourceModuleType for our example printer monitoring MP:

(Figure 6: The cook-down friendly DataSourceModuleType)

<DataSourceModuleType ID="com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM" Accessibility="Public">
	<Configuration>
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="IntervalSeconds" type="xsd:int" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SyncTime" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="TimeoutSeconds" type="xsd:integer" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="PrincipalName" type="xsd:string" />
	</Configuration>
	<OverrideableParameters>
		<OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
		<OverrideableParameter ID="SyncTime" Selector="$Config/SyncTime$" ParameterType="string" />
		<OverrideableParameter ID="TimeoutSeconds" Selector="$Config/TimeoutSeconds$" ParameterType="int" />
	</OverrideableParameters>
	<ModuleImplementation>
		<Composite>
			<MemberModules>
				<DataSource ID="DS1" TypeID="Windows!Microsoft.Windows.TimedScript.PropertyBagProvider">
					<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
					<SyncTime>$Config/SyncTime$</SyncTime>
					<ScriptName>com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM.vbs</ScriptName>
					<Arguments>$Config/PrincipalName$</Arguments>
					<ScriptBody>
Option Explicit
Dim oWbemServices, oPrinter
Dim oAPI, oBag
Set oAPI = WScript.CreateObject("MOM.ScriptAPI")
oAPI.LogScriptEvent "com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM.vbs", 1002, 0, "Printer Monitor Starting"
Set oWbemServices = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" &amp; WScript.Arguments(0) &amp; "\root\cimv2")
For Each oPrinter in oWbemServices.ExecQuery("SELECT Name,DetectedErrorState,PrinterStatus,Workoffline FROM Win32_Printer WHERE Network=false")
	Set oBag = oAPI.CreatePropertyBag()
	oBag.AddValue "DeviceID", "Printer:" &amp; oPrinter.Name &amp; ""
	oBag.AddValue "DetectedErrorState", oPrinter.DetectedErrorState &amp; ""
	oBag.AddValue "PrinterStatus", oPrinter.PrinterStatus &amp; ""
	oBag.AddValue "Workoffline", oPrinter.Workoffline &amp; ""
	oAPI.AddItem oBag
Next
oAPI.LogScriptEvent "com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM.vbs", 1003, 0, "Printer Monitor Ending"
oAPI.ReturnItems()
					</ScriptBody>
					<SecureInput />
					<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
					<EventPolicy />
				</DataSource>
			</MemberModules>
			<Composition>
				<Node ID="DS1" />
			</Composition>
		</Composite>
	</ModuleImplementation>
	<OutputType>System!System.PropertyBagData</OutputType>
</DataSourceModuleType>

Highlighted in this declaration are the following characteristics of this DataSourceModuleType that make it ready for cook down:

  • Of the possible Configuration elements of the underlying Microsoft.Windows.TimedScript.PropertyBagProvider, the only ones that are exposed and subject to change are IntervalSeconds, SyncTime, and TimeoutSeconds.  As we will see, these can still cause issues because they can be overridden; however, this is a risk that must be endured since a module that cannot be configured through overrides is bound to be problematic in any environment.
  • We have added another Configuration element, PrincipalName.  This is passed as the sole item in the Arguments element.  Having this be a blatantly named Configuration element should ensure that the script is implemented as we require it to be to take advantage of the cook down we're designing into it.
  • For the script itself, note that we are iterating through all printers and using the CreatePropertyBag method of the scripting API for each instance.
  • Note that each property bag itself contains values that you can already predict will be useful to multiple Monitors, Rules, etc.  Again, we are continuing to build on the concept that while we are accessing the source, we should gather as much information as we can to minimize the expense.
  • The per-instance property bag is added to the output using the scripting API's AddItem method, which is not a well-known method, since it does not appear in the simple template the UI provides.
  • Since there have been multiple property bag items added to the output, the script ends with a call to the ReturnItems method, which takes no parameters, rather than the usual Return with the property bag as the parameter.
  • Finally, note the LogScriptEvent method calls, which log events 1002 and 1003.  These will be important when we confirm the number of invocations of our script.
  • These three characteristics of our script deviate from the UI-provided template and form the basis for a unified DataSourceModuleType.  The point here is that the DataSource runs once and returns enough information to satisfy multiple workflows for multiple targets.

The XML representation of this script's output also deviates from the traditional DataItem by wrapping multiple DataItems in a Collection element.  For example:

(Figure 7: An example Collection DataItem)

<Collection>
    <DataItem ...>
        <Property Name="DeviceID" ...>Printer:NameOfPrinter1</Property>
        <Property Name="DetectedErrorState" ...>...</Property>
    </DataItem>
    <DataItem ...>
        <Property Name="DeviceID" ...>Printer:NameOfPrinter2</Property>
        <Property Name="DetectedErrorState" ...>...</Property>
    </DataItem>
</Collection>

The behavior that makes this useful is that OpsMgr will now send each DataItem in the Collection to the next module in any workflow that utilizes this DataSource for any target, as long as the effective Configuration is the same.  For us, this means that subsequent modules for every workflow for every target on a single machine will be serviced by a single run of this DataSource.

Our next order of business will be to utilize this module in some UnitMonitorTypes.

Building the Unit Monitor Types

The design pattern for every workflow that will use our DataSourceModuleType is essentially the same: specify our module as the start and then have its output fed to ConditionDetection modules that will filter down the output to just what we need for the particular workflow and target.  We will see concrete examples of UnitMonitorTypes, but this pattern applies to any type of workflow that can consume the output.  We'll talk through a few of these other examples after we complete the sample MP.  These examples are long, based on the verbosity of the UnitMonitorType schema, but worth including in their entirety.  Here, then, are two UnitMonitorTypes that utilize the unified DataSourceModuleType, with some highlights we will discuss below:

(Figure 8: Two UnitMonitorTypes based on our DataSourceModuleType)

<UnitMonitorType ID="com.focus24.PrinterMonitoring.WindowsPrinter.ErrorStateUMT" Accessibility="Internal">
	<MonitorTypeStates>
		<MonitorTypeState ID="StateIsHealthy" NoDetection="false"/>
		<MonitorTypeState ID="StateIsWarning" NoDetection="false"/>
		<MonitorTypeState ID="StateIsCritical" NoDetection="false"/>
	</MonitorTypeStates>
	<Configuration>
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="IntervalSeconds" type="xsd:int" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SyncTime" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="TimeoutSeconds" type="xsd:integer" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="PrincipalName" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="DeviceID" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SpecificState" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="Warning0Critical1" type="xsd:string" />
	</Configuration>
	<OverrideableParameters>
		<OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
		<OverrideableParameter ID="SyncTime" Selector="$Config/SyncTime$" ParameterType="string" />
		<OverrideableParameter ID="TimeoutSeconds" Selector="$Config/TimeoutSeconds$" ParameterType="int" />
		<OverrideableParameter ID="Warning0Critical1" Selector="$Config/Warning0Critical1$" ParameterType="string" />
	</OverrideableParameters>
	<MonitorImplementation>
		<MemberModules>
			<DataSource ID="DS1" TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM">
				<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
				<SyncTime>$Config/SyncTime$</SyncTime>
				<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
				<PrincipalName>$Config/PrincipalName$</PrincipalName>
			</DataSource>
			<ConditionDetection ID="FilterForStateIsHealthy" TypeID="System!System.ExpressionFilter">
				<Expression>
					<And>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/DeviceID$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DetectedErrorState']</XPathQuery>
								</ValueExpression>
								<Operator>NotEqual</Operator>
								<ValueExpression>
									<Value>$Config/SpecificState$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
					</And>
				</Expression>
			</ConditionDetection>
			<ConditionDetection ID="FilterForStateIsWarning" TypeID="System!System.ExpressionFilter">
				<Expression>
					<And>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/DeviceID$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DetectedErrorState']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/SpecificState$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<Value>$Config/Warning0Critical1$</Value>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>0</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
					</And>
				</Expression>
			</ConditionDetection>
			<ConditionDetection ID="FilterForStateIsCritical" TypeID="System!System.ExpressionFilter">
				<Expression>
					<And>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/DeviceID$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DetectedErrorState']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/SpecificState$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<Value>$Config/Warning0Critical1$</Value>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>1</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
					</And>
				</Expression>
			</ConditionDetection>
		</MemberModules>
		<RegularDetections>
			<RegularDetection MonitorTypeStateID="StateIsHealthy">
				<Node ID="FilterForStateIsHealthy">
					<Node ID="DS1"/>
				</Node>
			</RegularDetection>
			<RegularDetection MonitorTypeStateID="StateIsWarning">
				<Node ID="FilterForStateIsWarning">
					<Node ID="DS1"/>
				</Node>
			</RegularDetection>
			<RegularDetection MonitorTypeStateID="StateIsCritical">
				<Node ID="FilterForStateIsCritical">
					<Node ID="DS1"/>
				</Node>
			</RegularDetection>
		</RegularDetections>
	</MonitorImplementation>
</UnitMonitorType>
<UnitMonitorType ID="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOfflineUMT" Accessibility="Internal">
	<MonitorTypeStates>
		<MonitorTypeState ID="StateIsHealthy" NoDetection="false"/>
		<MonitorTypeState ID="StateIsWarning" NoDetection="false"/>
		<MonitorTypeState ID="StateIsCritical" NoDetection="false"/>
	</MonitorTypeStates>
	<Configuration>
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="IntervalSeconds" type="xsd:int" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="SyncTime" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="TimeoutSeconds" type="xsd:integer" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="PrincipalName" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="DeviceID" type="xsd:string" />
		<xsd:element xmlns:xsd="http://www.w3.org/2001/XMLSchema" name="Warning0Critical1" type="xsd:string" />
	</Configuration>
	<OverrideableParameters>
		<OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
		<OverrideableParameter ID="SyncTime" Selector="$Config/SyncTime$" ParameterType="string" />
		<OverrideableParameter ID="TimeoutSeconds" Selector="$Config/TimeoutSeconds$" ParameterType="int" />
		<OverrideableParameter ID="Warning0Critical1" Selector="$Config/Warning0Critical1$" ParameterType="string" />
	</OverrideableParameters>
	<MonitorImplementation>
		<MemberModules>
			<DataSource ID="DS1" TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.MonitorDSM">
				<IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
				<SyncTime>$Config/SyncTime$</SyncTime>
				<TimeoutSeconds>$Config/TimeoutSeconds$</TimeoutSeconds>
				<PrincipalName>$Config/PrincipalName$</PrincipalName>
			</DataSource>
			<ConditionDetection ID="FilterForStateIsHealthy" TypeID="System!System.ExpressionFilter">
				<Expression>
					<And>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/DeviceID$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='Workoffline']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>False</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
					</And>
				</Expression>
			</ConditionDetection>
			<ConditionDetection ID="FilterForStateIsWarning" TypeID="System!System.ExpressionFilter">
				<Expression>
					<And>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/DeviceID$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='Workoffline']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>True</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<Value>$Config/Warning0Critical1$</Value>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>0</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
					</And>
				</Expression>
			</ConditionDetection>
			<ConditionDetection ID="FilterForStateIsCritical" TypeID="System!System.ExpressionFilter">
				<Expression>
					<And>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='DeviceID']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>$Config/DeviceID$</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<XPathQuery>Property[@Name='Workoffline']</XPathQuery>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>True</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
						<Expression>
							<SimpleExpression>
								<ValueExpression>
									<Value>$Config/Warning0Critical1$</Value>
								</ValueExpression>
								<Operator>Equal</Operator>
								<ValueExpression>
									<Value>1</Value>
								</ValueExpression>
							</SimpleExpression>
						</Expression>
					</And>
				</Expression>
			</ConditionDetection>
		</MemberModules>
		<RegularDetections>
			<RegularDetection MonitorTypeStateID="StateIsHealthy">
				<Node ID="FilterForStateIsHealthy">
					<Node ID="DS1"/>
				</Node>
			</RegularDetection>
			<RegularDetection MonitorTypeStateID="StateIsWarning">
				<Node ID="FilterForStateIsWarning">
					<Node ID="DS1"/>
				</Node>
			</RegularDetection>
			<RegularDetection MonitorTypeStateID="StateIsCritical">
				<Node ID="FilterForStateIsCritical">
					<Node ID="DS1"/>
				</Node>
			</RegularDetection>
		</RegularDetections>
	</MonitorImplementation>
</UnitMonitorType>

Here are the highlights:

  • First, I've added three more specific elements to the Configuration element: DeviceID, SpecificState, and Warning0Critical1.  These will be used to identify the target, the specific state to be detected, and the criticality of the condition, respectively.  Note that none of these are passed to the DataSource, so it is insulated from any variations related to the target or the particular monitor.  Again, this points to OpsMgr's ability to use a single run of the DataSource module for multiple targets and workflows.
  • I've also made Warning0Critical1 an OverrideableParameter.  Using a three state monitor with this parameter gives us the ability to control the criticality of the actual monitor, not just the alert.  The monitor's criticality is usually fixed.
  • The ConditionDetection modules for the different states are much more complicated than before.  Now, we have two or three expressions that must be And'ed together.
    • The first expression matches only DataItems related to this DeviceID (Target).
    • The second expression matches whether or not we are in the SpecificState (which is designed here as Unhealthy).  Note the difference between the Healthy ConditionDetection module and the Warning/Critical ConditionDetection modules.
    • The third expression (only on the Warning and Critical ConditionDetection modules) matches our Warning0Critical1 parameter and matches only for the appropriate criticality.
  • These design patterns follow for the second UnitMonitorType, though there is no need for a SpecificState Configuration element because it is testing a boolean condition.
  • Note that we should not use $Target$ references here.  Although they seem like they would fit, they in fact do not and their behavior is indeterminate, based on my observations.  We will use $Config$ references here and fill them with $Target$ references when we utilize the UnitMonitorType in a UnitMonitor.

The full picture of how this design pattern comes together should be starting to form at this point.  The only step left is to craft a few UnitMonitors that are based on these UnitMonitorType.

Building the Unit Monitors

I can keep these examples simple.  Since there are a dozen or so error states for a printer, we could create that many UnitMonitors.  We could also re-visit our DataSource script and normalize the values somewhat (i.e. return a few discrete values for which we want monitors and return anything else as a single value we could roll into an "Other Problem" UnitMonitor).  For our purposes, I will create three: two based on the first UnitMonitorType and one based on the second.  This will be another large section of management pack XML, but well worth the review.

(Figure 9: Three UnitMonitors based on our two UnitMonitorTypes)

<UnitMonitor ID="com.focus24.PrinterMonitoring.WindowsPrinter.Error.Monitor"
    Accessibility="Public" Enabled="true"
    Target="com.focus24.PrinterMonitoring.WindowsPrinter"
    ParentMonitorID="Health!System.Health.AvailabilityState"
    Remotable="true" Priority="Normal"
    TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.ErrorStateUMT"
    ConfirmDelivery="false">
	<Category>AvailabilityHealth</Category>
	<AlertSettings AlertMessage="com.focus24.PrinterMonitoring.WindowsPrinter.Error.Alert">
		<AlertOnState>Warning</AlertOnState>
		<AutoResolve>true</AutoResolve>
		<AlertPriority>Normal</AlertPriority>
		<AlertSeverity>Warning</AlertSeverity>
		<AlertParameters>
			<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter1>
			<AlertParameter2>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</AlertParameter2>
		</AlertParameters>
	</AlertSettings>
	<OperationalStates>
		<OperationalState ID="ErrorCritical" MonitorTypeStateID="StateIsCritical" HealthState="Error"/>
		<OperationalState ID="ErrorWarning" MonitorTypeStateID="StateIsWarning" HealthState="Warning"/>
		<OperationalState ID="ErrorHealthy" MonitorTypeStateID="StateIsHealthy" HealthState="Success"/>
	</OperationalStates>
	<Configuration>
		<IntervalSeconds>300</IntervalSeconds>
		<SyncTime />
		<TimeoutSeconds>150</TimeoutSeconds>
		<PrincipalName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PrincipalName>
		<DeviceID>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</DeviceID>
		<SpecificState>1</SpecificState>
		<Warning0Critical1>0</Warning0Critical1>
	</Configuration>
</UnitMonitor>
<UnitMonitor ID="com.focus24.PrinterMonitoring.WindowsPrinter.Offline.Monitor"
    Accessibility="Public" Enabled="true"
    Target="com.focus24.PrinterMonitoring.WindowsPrinter"
    ParentMonitorID="Health!System.Health.AvailabilityState"
    Remotable="true" Priority="Normal"
    TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.ErrorStateUMT"
    ConfirmDelivery="false">
	<Category>AvailabilityHealth</Category>
	<AlertSettings AlertMessage="com.focus24.PrinterMonitoring.WindowsPrinter.Offline.Alert">
		<AlertOnState>Error</AlertOnState>
		<AutoResolve>true</AutoResolve>
		<AlertPriority>Normal</AlertPriority>
		<AlertSeverity>Error</AlertSeverity>
		<AlertParameters>
			<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter1>
			<AlertParameter2>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</AlertParameter2>
		</AlertParameters>
	</AlertSettings>
	<OperationalStates>
		<OperationalState ID="OfflineCritical" MonitorTypeStateID="StateIsCritical" HealthState="Error"/>
		<OperationalState ID="OfflineWarning" MonitorTypeStateID="StateIsWarning" HealthState="Warning"/>
		<OperationalState ID="OfflineHealthy" MonitorTypeStateID="StateIsHealthy" HealthState="Success"/>
	</OperationalStates>
	<Configuration>
		<IntervalSeconds>300</IntervalSeconds>
		<SyncTime />
		<TimeoutSeconds>150</TimeoutSeconds>
		<PrincipalName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PrincipalName>
		<DeviceID>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</DeviceID>
		<SpecificState>9</SpecificState>
		<Warning0Critical1>1</Warning0Critical1>
	</Configuration>
</UnitMonitor>
<UnitMonitor ID="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOffline.Monitor"
    Accessibility="Public" Enabled="true"
    Target="com.focus24.PrinterMonitoring.WindowsPrinter"
    ParentMonitorID="Health!System.Health.AvailabilityState"
    Remotable="true" Priority="Normal"
    TypeID="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOfflineUMT"
    ConfirmDelivery="false">
	<Category>AvailabilityHealth</Category>
	<AlertSettings AlertMessage="com.focus24.PrinterMonitoring.WindowsPrinter.WorkingOffline.Alert">
		<AlertOnState>Error</AlertOnState>
		<AutoResolve>true</AutoResolve>
		<AlertPriority>Normal</AlertPriority>
		<AlertSeverity>Error</AlertSeverity>
		<AlertParameters>
			<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter1>
			<AlertParameter2>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</AlertParameter2>
		</AlertParameters>
	</AlertSettings>
	<OperationalStates>
		<OperationalState ID="WorkingOfflineCritical" MonitorTypeStateID="StateIsCritical" HealthState="Error"/>
		<OperationalState ID="WorkingOfflineWarning" MonitorTypeStateID="StateIsWarning" HealthState="Warning"/>
		<OperationalState ID="WorkingOfflineHealthy" MonitorTypeStateID="StateIsHealthy" HealthState="Success"/>
	</OperationalStates>
	<Configuration>
		<IntervalSeconds>300</IntervalSeconds>
		<SyncTime />
		<TimeoutSeconds>150</TimeoutSeconds>
		<PrincipalName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PrincipalName>
		<DeviceID>$Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$</DeviceID>
		<Warning0Critical1>1</Warning0Critical1>
	</Configuration>
</UnitMonitor>

Here, the highlights are just to call your attention to how the UnitMonitorType is utilized by two different UnitMonitors.  They also differ in their use of the Warning0Critical1 element, which is also reflected in the default AlertOnState and AlertSeverity elements.  If you follow the values specified here back through the UnitMonitorType and then back to the DataSourceModuleType, you will see where the $Target/Host/.../PrincipalName$ propagates back to the DataSourceModuleType while the $Target/.../DeviceID$ only propagates as far back as the ConditionDetection modules in the UnitMonitorType.  The same is true for the Warning0Critical1 Configuration element and, for the first two, the SpecificState element.  Hence, the DataSourceModuleType will cook down to one run for all targets of all three monitors.

This essentially completes our management pack.  Now, let's see what activity this generates in a test environment.  The test environment represented here consists of two machines SCOM-WIN2K3, an agent-managed machine with three printers installed, and SCOM-SERVER, the RMS, also with three printers installed.  For demonstration purposes, I've set one printer on each machine to Work Offline.  I've also set one printer on each machine to use a network port at 127.0.0.1, and then generated a test page, which put the printer into an error status.  Let's see that configuration first:

(Figures 10 and 11: The Printer Configuration from Each Test Server)

Printers1

Printers2

Next, let's see what our LogScriptEvent method call recorded on both machines (events 1002 and 1003):

(Figures 12 and 13: Event Log Excerpts from Each Test Server)

Log1

Log2

So far, so good.  Both show the new configuration becoming active and then one and only one invocation of our DataSource module.  Let's see what the state view says:

(Figure 14: The State View from OpsMgr)

 State

Also good.  Two printers (targets) from two different servers are showing two different severities.  We can confirm that this one invocation of our script has fed multiple workflows by reviewing the alerts generated:

(Figure 15: The Alerts View from OpsMgr)

Alerts

This is exactly what we expected to see, which confirms the behavior we designed for.  Finally, just because we bothered to include a diagram icon for this class:

(Figure 16: The Diagram View from OpsMgr)

Diagram

Other Workflow Types

As promised, we can easily talk through how this would be implemented for different types of workflows:

  • For a performance collection Rule:
    • The unified DataSource module (or another similarly designed) produces a Collection.
    • The Collection is consumed by a ConditionDetection module to select only the DataItems related to performance metrics for a specific target.
    • Another ConditionDetection module receives that particular DataItem and maps it to a System.Performance.Data DataItem.
    • One or more WriteActions store the DataItem in the DB/DW.
    • Essentially, this is a re-implementation of the built-in scripted performance composite module that ships with OpsMgr, utilizing a unified DataSource module and an additional ConditionDetection module for filtering just before the ConditionDetection module that does the mapping.
  • For an event collection Rule:
    • The unified DataSource module (or another similarly designed) produces a Collection.
    • The Collection is consumed by a ConditionDetection module to select only the DataItems related to performance metrics for a specific target.
    • Another ConditionDetection module receives that particular DataItem and maps it to a System.Event.Data DataItem.
    • The DataItem is consumed by OpsMgr.
    • This can similarly be thought of as a re-implementation of the built-in scripted event provider, with similar changes as noted for the performance example above.

Revisiting the UI-Generated Monitor

Also, as promised, let's try to apply what we've seen here to the UI-generated monitor and discuss how cook down might be implemented at least for individual workflows and possibly for multiple workflows.  To realize the same type of cook down in a UI-generated script:

  • Only specify the highest level of abstraction in the arguments.  Do not include arguments based on the $Target$.  Only use arguments as granular as $Target/Host$, for example.
  • Do not use the pattern given by the example script (i.e. one property bag returned with the Return method).  Instead, access the source and return multiple property bags using multiple CreatePropertyBag, the IncludeItem, and the ReturnItems methods, respectively.
  • In each property bag, instead of including just a property for the state of a single target, you must include properties to identify the target (such as the DeviceID in our example).  This is required since you will be returning multiple property bags.  They are not very valuable if they cannot be differentiated!
  • For the healthy, warning, and/or critical expressions, use an And that filters based on the $Target$ and the state of that target.
  • That should make the underlying DataSource eligible for cook down.

You can even create multiple monitors that will all cook down to a single run using the UI, but this, as I mentioned earlier, is quite a hack:

  • Follow all of the guidelines above.
  • In your script, include in the property bag multiple states or measurements for each target, identifying them by property name.  Alternatively, you could include multiple property bags per target, but at least one property would need to identify the state or measurement to which each property bag applies.
  • Add another expression under the And expression to further filter the collection of property bags to just the particular state or measurement each particular monitor or rule needs.
  • Duplicate the arguments, timeout, interval, and script body exactly in each monitor or rule you create.  This is perhaps the single biggest reason I feel that this method is a complete hack.
  • In theory, this will allow OpsMgr to cook down all of the UI-created workflows together.

Limitations

The one caveat with the unified model is that you must be very careful when dealing with state that represents a "present/missing" situation.  If the presence of something generates an unhealthy condition, that is easy enough to include in the Collection.  The issue is that if, on the next run, that presence is now missing, you must be able to include an appropriate DataItem in the Collection.  In some cases, this might be difficult, especially since the unified DataSource does not have access to the inventory of targets or their current state.

For a more concrete example, consider the fictitious example of a DataSource that examines the conditions of long-running jobs.  There would need to be a Discovery that creates the instances of the jobs.  Then, any job that was flagged by the DataSource module would have a corresponding DataItem in the Collection, which would be examined by the appropriate ConditionDetection modules in the UnitMonitorType and would generate the appropriate state change.  If, on a subsequent run, the job no longer existed, the monitor would never change back to Healthy.  This is because we're assuming that the DataSource would have trouble reporting the status of a job that no longer exists.  This will cause the issue to linger in OpsMgr, especially if the Discovery run interval is much greater than the monitor (as is usually the case).

Another example that would not be fixed even by the next Discovery run would be if a DataSource module examined the state of hundreds of persistent connections to a service.  Using another table of open transactions, the DataSource flagged any transaction that was older than a particular number of minutes.  The DataSource would ostensibly return a Collection of all connections that have at least one old transaction.  This would similarly be processed and set the state accordingly.  On subsequent runs, the DataSource would not have the context of what it had set as unhealthy before.  In this case, to compensate, you would probably want to ensure that every connection was represented in the Collection returned, defaulting all to healthy unless otherwise determined to be unhealthy.  Depending on the exact incarnation of this issue, that may or may not be a reasonable thing to do.  If it is not, you would likely need to have your DataSource be stateful somehow, by means external to OpsMgr, such that it can return to healthy anything it had previously set to unhealthy.

This is a general limitation of the design pattern represented here, but should be easily identifiable.  Such cases will require additional work to balance the need for granular invocations with the efficiencies of cook down.

DataItem Overload

Another side-effect of using a cooked down design model that should be considered is the limitations imposed by the dropping algorithm of OpsMgr.  When there is too much data outstanding in a rule, OpsMgr will drop DataItems until the rule is no longer overloaded.  The maximum number of DataItems that may be outstanding in a rule is 128.  The overall number of outstanding DataItems for all rules active on a HealthService is 5,120.  These values are not configurable.  This behavior is by design, because most often, the presence of that many outstanding DataItems indicates a problem with the MonitoringHost.exe.  In the case of cooked down designs, this can present a problem.  Returning to our example, consider an environment that has a large number of printers installed.  By large, I mean a number much greater than 128.  When the cooked down script returns its Collection, there will be one DataItem per printer contained in it.  Those DataItems will fan out to the various receiving modules in the workflow.  Depending on the processing speed of the system on which the workflows are running, it is quite possible to exceed the 128 limit and even the 5,120 limit (depending on how many distinct iterations of this design pattern are present).  There is currently no work-around for this, except to watch for these issues (Event ID 4506 in the Operations Manager log) and further balance the design between fewer distinct invocations (producing larger Collections) and smaller Collections (produced by more distinct invocations).

The Issues with Overrides

A very simple reality can have very drastic consequences when we consider designing an MP according to this pattern.  Overrides, by definition, change the effective configuration of a workflow and can be targeted very granularly.  This can cause the unified (and expensive and monolithic) script to be invoked multiple times.  Some of those invocations could assemble the entire Collection only to be consumed by a single target.  This is most definitely a balancing act.  Be aware of the ramifications of overrides.  Document them clearly in your MP Guide.  Where possible, try to compensate with good design.

Another issue related to overrides that arises with cook down is the granularity with which you can disable the actions a workflow is taking.  If there are individual workflows for each and every instance of the target, you can prevent a specific workflow from running at all for a particular target by disabling it.  This includes the DataSource invocation against that particular target.  When you implement a cooked down version of the workflow, you can disable the specific workflow that evaluates the results of the DataSource, but the DataSource will retrieve the source data from that target regardless.  This is because by definition, the DataSource is no longer target-specific, but rather assembles the data for all targets within a particular (larger) scope.  The only scenario where this could be a problem is where the actual collection of some data point from a target is undesirable in some cases.  For example, if you had a printer whose driver caused a significant delay and system load whenever it was interrogated by WMI, you might wish to override the workflow against that particular printer.  Although you can override the workflow that evaluates the data returned from the DataSource for that particular target, the data point will still be retrieved.  As you can probably imagine, the design pattern that allows for cook down and for granular exclusion could become quite complex. This is another trade-off, so be aware of the ramifications of this in your design.

Conclusion

Designing quality management packs is a fairly broad mix of requirements.  Modeling your class hierarchy to represent what you need to manage and still fit into the OpsMgr paradigm is the first step.  Planning to utilize everything OpsMgr has to offer is another important step (e.g. Distributed Applications, Reports, etc.).  Intelligently instrumenting whatever it is you are managing such that OpsMgr reports relevant, timely, and "right-sized" amounts of information is usually the step on which we concentrate the most.  Perhaps the most important step, though, is to design your MP to live harmoniously alongside other MPs in environments where the scope of what your MP monitors may be a small part of the larger environment.  To that end, I offer this treatment of cook down for the general good.

Comments, as always are welcomed and appreciated.

VJD/<><

You can download the whole MP here.

Posted by vdipippo | with no comments

Relationships in OpsMgr 2007, Part 1

Abstract

Of the two EntityTypes in the OpsMgr 2007 schema, ClassTypes and RelationshipTypes, the latter seems to be a straight-forward concept, but can have interesting ramifications in its application.  Because of this, I decided to investigate them more fully, which is the subject of this series.  In this first part, I will discuss the declaration of relationship types in a management pack and investigate some of the ramifications and restrictions related to the various base classes.  In subsequent parts, I will investigate how relationships are involved in OpsMgr's facilities and some nuances that can help explain some of OpsMgr's less obvious behavior.

Background

Relationships are at the foundation of many OpsMgr facilities:

  • Hosting
  • Groups
  • Distributed Applications
  • Reporting
  • Alerting
  • Dependency Monitors

As you can see, a majority of the key facilities in OpsMgr rely on relationships.

Declaration

First, let's discuss the declarative side of a relationship.  More accurately, relationship types are declared; instances of that relationship type are discovered.  The declaration of a RelationshipType is very simple, as shown in the MP snippets below:

<RelationshipType ID="System.Reference" Accessibility="Public" Abstract="true">
    <Source>System.Entity</Source>
    <Target>System.Entity</Target>
</RelationshipType>
<RelationshipType ID="System.Containment" Accessibility="Public" Abstract="true" Base="System.Reference">
    <Source>System.Entity</Source>
    <Target>System.Entity</Target>
</RelationshipType>

You may or may not recognize these two from the System.Library management pack.  The System.Reference relationship type is the base of all relationship types, which explains why it has no Base attribute.  System.Containment is another base relationship type, but derives from System.Reference.  It is interesting to note that all relationship types must derive from an abstract relationship type, and abstract relationship types can only be declared in System.Library.  That essentially limits the lineage of a relationship type to two, three, or four generations through one of three possible parents, respectively:

System.Reference to Your.Derived.Type

System.Reference to System.Containment to Your.Derived.Type

System.Reference to System.Containment to System.Hosting to Your.Derived.Type

There is a fourth RelationshipType declared in System.Library: System.WatchedBy (a.k.a. Perspective).  This is not an abstract type and therefore is not available for derivation.

Examining the definitions, we can see that RelationshipType elements follow the schema for declaring an "accessible" MP Element and add two attributes specific to themselves.  The standard attributes are ID, Comment (which is not often used), and Accessibility.  The two additional attributes for RelationshipType elements are Abstract and Base.  I mentioned that Comment is rare.  One regular use of this attribute, albeit a bit off topic, is in Rules.  Many Rule elements have the GUIDs for the MOM 2005 objects from which they came recorded in the Comment attribute.  There are a few other examples, but this attribute is probably one of the least used, from my observations.

For child elements of the RelationshipType element, there are three flavors: Source, Target, and Property.  You will find the great majority of relationship types are declared with just Source and Target, which can appear only once.  These are at the heart of what a relationship is, exactly.

Both Source and Target specify the ClassTypes that make up the two ends of the RelationshipType.  These follow the standard notation of [Alias!]ID, such as System!System.Entity (if we were in another management pack, where we're assuming that the System.Library has been aliased as System) or, in this case, just System.Entity (because this element, the RelationshipType, and that element, the ClassType, are declared in the same management pack).  Whatever the relationship represents, it represents it between an object of class Source and an object of class Target.  This is fairly straight-forward, but we can talk through a few examples:

  • For the System.Containment relationship type, it declares that some System.Entity (Source) can contain some other System.Entity (Target).
  • For the System.Hosting relationship type, it declares that some System.Entity (Source) hosts some other System.Entity (Target).
  • Taking another example from the AD management pack:
<RelationshipType ID="Microsoft.Windows.Server.2003.AD.DomainControllerRoleHostsNtFrs" Accessibility="Public" Abstract="false" Base="System!System.Hosting">
  <Source>Microsoft.Windows.Server.2003.AD.DomainControllerRole</Source>
  <Target>Microsoft.Windows.Server.2003.AD.DC.NtFrs</Target>
</RelationshipType>
  • This RelationshipType declares that an AD.DomainControllerRole hosts AD.DC.NtFrs.  The ID (and DisplayString), Source, and Target of this relationship are all well-chosen and describe the relationship between the types very well.

Before we leave the discussion about the declaration of a RelationshipType, it is worth noting that, while rare, some very good examples of RelationshipTypes with Property declarations exist in common MPs.  For example, the AD management pack declares this relationship:

<RelationshipType ID="Microsoft.Windows.Server.AD.Site.contains.Microsoft.Windows.Server.DC.Computer" Accessibility="Public" Abstract="false" Base="System!System.Containment">
  <Source>Microsoft.Windows.Server.AD.Site</Source>
  <Target>Windows!Microsoft.Windows.Server.DC.Computer</Target>
  <Property ID="IsBridgeheadIP" Type="string" CaseSensitive="false" Length="255"/>
  <Property ID="IsBridgeheadSMTP" Type="string" CaseSensitive="false" Length="255"/>
</RelationshipType> 

I think this is an excellent example (which also doesn't use the Comment attribute, if you'll notice).  In this case, the properties of whether a particular DC is the IP or SMTP replication bridgehead for a particular site is stored in the relationship between the site and its DCs.  I think that the tendency would be make these properties of the DC.Computer class itself.  While you might be able to pull this off in this example, any many-to-many relationships would need to store relevant properties in the relationship, where they belong.  Consider this adapted example:

<RelationshipType ID="Example.WebServer.Hosts.WebSite" ... Base="System!System.Hosting">
  <Source>Example.WebServer</Source>
  <Target>Example.WebSite</Target>
  <Property ID="IsPreferredServer" Type="string" CaseSensitive="false" Length="1"/>
</RelationshipType>

Clearly, in this case, you cannot mark the web server as being a preferred server, since it could host multiple web sites, for some of which it may not be the preferred server.  I would also not be very elegant to have some comma-delimited property of the web site that listed all preferred servers.  This is an example of where a RelationshipType declared with Property elements answers some important service modeling needs.

Restrictions and Ramifications

In spite of being relatively straight-forward to declare, relationship types have some interesting restrictions on and ramifications of their definitions:

  • For relationships that derive from System.Reference:
    • References are generic and can form relationships between just about any combination of classes.  Children can reference parents, parents can reference children, they can be reciprocal (class A references class B while class B also references class A), reflexive (class A can reference class A), and many-to-many.  In terms of relationship types that cause OpsMgr to perform in some manner, there are far more that derive from System.Containment and System.Hosting than from System.Reference.  Many reference relationships are meaningful only in the context of the management pack in which they are declared.  That notwithstanding, there are some very important uses of System.Reference relationships intrinsic to OpsMgr.  One absolutely ubiquitous example is the Microsoft.SystemCenter.HealthServiceShouldManageEntity relationship.  Another genre is the series of relationships formed when using the Distributed Application designer to connect component groups together.
  • For relationships that derive from System.Containment:
    • Containment relationships are just as freely declared as reference relationships, but some incarnations can have adverse effects on OpsMgr.  I have found that certain ViewTypes, especially state and diagram, tend to become very unpredictable if relationships form loops, etc.  In short, it has been my experience that containment relationships should be declared to represent as clean a containment model as possible.  In other words, it should be a rarity that class A contains class B while class B also contains class A, etc.  Although there are not as many restrictions on containment relationships (versus hosting), care should still be taken to model these succinctly.
  • For relationships that derive from System.Hosting:
    • There are many restrictions on hosting relationships.  This is because many things come into play when two classes are related by hosting.  First, if the host is deleted, the hosted classes are also deleted, so you have a referential integrity issue.  Also, the health service that runs workflows related to the host will also run workflows related to the hosted classes.  Finally, and perhaps most importantly, a hosting relationship is baked right into the schema of a class.  Hosting relationships are discovered automatically and you cannot even discover an instance of a hosted class without also discovering for that instance the key properties of its host and all of the hosts up the chain.  In the database, the underlying database objects for a class will contain the key fields of its hosts.  With these types of behind-the-scenes activities, it is clear to see why there are so many restrictions on the declaration of hosting relationships.
    • Restrictions on the source ClassType
      • The source does not need to be hosted itself, but it can be.
      • The source must be non-abstract.
      • The source cannot be the same as the target.
      • The source can be from any management pack.
    • Restrictions on the target ClassType
      • The target ClassType definition must be in the same management pack as the RelationshipType definition.
      • The target must be a hosted class type or derive from a class type that is hosted.
      • The target must not be the target of any other hosting relationship.
      • The target can be abstract or non-abstract.
    • Restrictions on using the Hosted attribute of a ClassType
      • If a class is hosted, it must have a hosting relationship defined for it or derive from a hosted class (which would need its own relationship type)
      • Once a ClassType is declared with the Hosted attribute set to true, it must remain consistent on all classes derived from that class.
    • Other restrictions:
      • A class can be the target of one and only one hosting relationship.  This follows the class through its descendants (i.e. a derived class cannot be the target of a hosting relationship different from that of one of its base classes).
      • The source and target cannot derive from some common class if that class has any key properties defined.  This is slightly different from the previous restrictions in that it is still not allowed even if the common ancestor is not hosted.  That is, it is acceptable to have Class B hosting Class C, and have them both derive from Class A if the relationship does not violate any of the aforementioned restrictions; however, if Class A contains key fields, the MP will fail on import.  Oddly enough, it passes MPVerify, but when it is imported, the system will try to create Class C with two sets of identical key properties (i.e. columns): its own set of key properties inherited from Class A and the set of key properties from its host, Class B, which are also inherited from Class A.  OpsMgr apparently uses a hash to create a "decorated" column name for properties of a given class that is calculated from the class and property names.  Since the contributing class name (Class A) and the properties (the keys on Class A) are identical, the MP fails to import, throwing a SQL error related to the attempt to create a table or view with duplicate column names.  I've actually run into this problem in a real example, so I hope that this is something that is addressed in a future release; however, this restriction is currently required.
      • There can be no circular hosting references, where A hosts B, B hosts C, and then C hosts A.

As you can see, the discussion of what can and cannot be done with hosting relationships is deceptively complex.  That's due in part to the fact that there is but a shade of meaning between containment and hosting in the purest sense of the words, but due to what they effect in OpsMgr, there is a great deal more to consider with hosting relationships.

For the most part, if the relationships are kept relatively simple, you will not run into these restrictions.  The most common situation that brings you into contact with them is when you are attempting to integrate some existing, robust object model into the OpsMgr model.  If you are simply extending the existing concepts around which OpsMgr has already been built (servers, applications, services, etc.), you should have a much easier time than this discussion may lead you to believe.  None-the-less, it is interesting to test the limits of what OpsMgr will allow and perhaps even more interesting to see what OpsMgr will allow that perhaps it should not!

In the next post, we will begin to see how the different types of relationships effect OpsMgr behavior.

VJD/<><

Posted by vdipippo | with no comments

Workflow Tracer 1 of 4: Workflow Primer

(Links to parts: Part 1, Part 2, Part 3, Part 4

I have written a managed code module for tracing the data items moving through a workflow.  This is a four-part article that presents the module in the fourth part.  If you are familiar with workflows, modules, and data items (the first three posts, respectively), you can go straight to the fourth article.  If not, please read on…  Even if you are familiar with these topics, you may enjoy this presentation of them and I would certainly welcome and appreciate any feedback.

The heart and soul of OpsMgr 2007 is undoubtedly the workflow engine.  This is one of the most basic functions of the Health Service: to implement the policy received from the management group by running the workflows described therein.  Discoveries, rules, and monitors are all workflows.  They are also therefore each comprised of modules that are run in a particular order to achieve the desired result.  Consider an example of each:

    • A discovery might be comprised of a timer module that executes a script at some interval.  The script, in turn, pulls some WMI data (for example).  That “data item” may then be passed to a condition detection module that evaluates it against some condition (e.g. that the WMI data contains certain properties).  Provided those conditions are met, the data item may then be passed to another condition detection module that maps the WMI data item to another data item in the specific OpsMgr discovery data format for a particular class (e.g. Windows.Computer).  This is essentially the output of the sequence.  This sequence effectively results in an instance of some class being created or updated.  That’s a workflow.

    • Another example would be a monitor.  A monitor might consist of timer module that runs a script on a schedule.  The script in this case, in turn, pulls some instrumentation data from some interesting data source (e.g. the management software that interfaces with the building environmental monitor system).  That data item is then passed to any number of different sequences of modules.  This is very well defined: one sequence for each possible state of the object (health, warning, critical, etc.).  It is assumed that only one of these sequences will pass all of the conditions and generate output data at any given moment.  The sequence that generates output data tells OpsMgr to which state the targeted object should transition (if it is in some other state).  This is another workflow.

    • The final example is a rule.  A rule will usually start in a similar way to the other two examples: some trigger or timer starts a sequence of modules in motion.  At some point, some data source or probe will produce some data item.  Any number of condition detection modules may manipulate the data item, transform it, or suppress it.  In the case of a rule, any number of “write action” modules will then accept the data item and do something with it.  For example, just about every rule includes one of the following four write actions: generate alert, set some monitor’s state, write to the operational DB, or write to the data warehouse.  Rules are not required to use these nor is that by any means an exhaustive list.  This is just meant to give you an idea of how a rule is a collection of modules.  And it’s also a workflow.

      In fact, every major monitoring- and operations-related activity of the Heatlh Service is a workflow.  These include all discoveries, monitors, diagnostics, recoveries, rules, and agent tasks.

      Understanding workflows is key to understanding OpsMgr 2007.  You should understand what they are, that most major functions of OpsMgr are workflows, and that a workflow is essentially a pipeline of modules.

      In the part 2 of this series, I will discuss those modules in more detail.

      VJD/<><

      (Links to parts: Part 1, Part 2, Part 3, Part 4

      Posted by vdipippo | with no comments

      Workflow Tracer 2 of 4: Module Primer

      (Links to parts: Part 1, Part 2, Part 3, Part 4)

      Now that I’ve given a primer on workflows, it is important to discuss modules a bit.  The following discussion is generalized.  Modules are nothing if not powerful, flexible, and completely open-ended.  Therefore, while I am confident that this discussion will be conceptually helpful, understand that there can be deviations from these generalizations.

      First, it’s worth noting that there are four basic module types: DataSource, ProbeAction, ConditionDetection, and WriteAction.  This is also the order in which they appear, both as children of the ModuleTypes element in the management pack and as children of the MemberModules element of a Composite module.

      • DataSource and ProbeAction modules essentially do some work to collect some attributes of some entity and produce a DataItem.  These modules can be thought of as operating on entities extenal to OpsMgr itself in a manner that does not change them (i.e. read only) for the expressed purpose of brining the “facts” about their current state of configuration or operation into OpsMgr’s universe.  A DataItem produced by a DataSource or ProbeAction is the unit of information that starts that process.
      • ConditionDetection modules are essentially filters that consume a DataItem as input and either manipulate it or check it against some conditions.  ConditionDetection modules are always defined with an OutputType, but depending on their purpose and the contents of the input, they may or may not output a DataItem on any given occasion.  When a ConditionDetection module produces a DataItem, it will be passed on to the next module in the workflow, either unchanged (e.g. as is the case with the expression filters) or transformed (e.g. as is the case with the discovery mappers).  When a ConditionDetection module does not produce a DataItem, the workflow ends (e.g. as is the case when a test condition is not met).  These modules can be thought of as operating on the DataItem produced by the DataSource or ProbeAction module.  They are usually internal to the workflow, meaning that they only consider the input and their purpose and configuration.  They do not usually use data external to the data flowing through the workflow.  I reiterate, though, that they do not strictly need to be simple filters that can only see the DataItem passed in or only affect the DataItem passed out.  Some do quite a bit more than that.  Good examples of this are the ConditionDetection modules involved in calculating delta values and baselines.  They do operate on incoming DataItems and produce outgoing DataItems, but they also hold state themselves.
      • WriteAction modules receive a DataItem and do something active with it, like write it to the database, generate an alert, or otherwise use its contents to affect some other entity.  These are formally defined as the only modules that will change something external to OpsMgr.  This is one reason why you don’t normally see a WriteAction module in a discovery or monitor.  WriteAction modules optionally can produce output.

      See the following diagram for a visual treatment of the four module types.

      Slide1

      This, therefore, gives a lifecycle of a workflow: from the creation of a DataItem, through manipulations, to the final action a DataItem causes.   Of course, to continue painting a picture of how these modules are used, I would also note that discoveries and monitors do not usually contain write actions.  The nature of a discovery or monitor infers that their end state will be internal to OpsMgr: a new or updated instance or a change in state, respectively.  In those cases, then, it is OpsMgr itself that consumes the final output of the workflow.

      The next two paragraphs contain very rich information.  Based on feedback, I have added a few diagrams to help understand these concepts.  I suggest an initial read of these two paragraphs, followed by a review of the diagrams, and then a quick re-read.

      To make matters even more interesting, the definition/implementation of any given module can be one of three forms: native code (implemented as a COM .DLL), managed code (implemented as a .NET assembly .DLL), or a Composite module.  A Composite module is itself a pre-defined sequence of other modules.  Composite modules are the most interesting to research, since a Composite can be defined as the sequence of several modules, each of which may be a Composite itself, each of which yet again may be a Composite, etc.  Eventually, all module implementations will end at native or managed code, but it can be quite a ride getting there.  Since any given module will be of one of four module types implemented in one of three possible ways, how truly open-ended a module’s functionality can be should be self-evident.  I have written a tool for examining such lineages in OpsMgr, which you can read about here.  It actually started out as a tool that would only explore the lineage of modules, but I later expanded it to include class types, relationship types, and data types.

      This also skews the clean definitions and characterizations of the four module types I discussed above.  Consider this: a DataSourceModuleType may be defined as a Composite of a DataSource, a ProbeAction, and a ConditionDetection module.  Another DataSourceModuleType may use that DataSource module in its own definition of a Composite that includes another ProbeAction and another ConditionDetection module.  That would make the final DataSourceModuleType a DataSource, ProbeAction, ConditionDetection, ProbeAction, and finally ConditionDetection sequence.  That final DataSourceModuleType might then be used in a workflow for a Discovery rule.  Although only certain combinations are allowed for each composite type (i.e. a ConditionDetection module may not come between a DataSource and ProbeAction in a Composite DataSourceModuleType), it’s otherwise completely flexible.

      The following diagrams help to present this material more visually.

      The first diagram presents some basic modules that will be used in the examples that follow:

      Slide2

      The next diagram depicts how Composite modules are created and eventually used in a workflow:

      Slide3

      The next diagram shows a more detailed view of the workflow presented above (the Rule):

      Slide4

      This next diagram shows how the Composite modules may also be used with other modules to form a UnitMonitorType.  This next diagram is an example of a two-state monitor (Healthy/Warning or Healthy/Critical):

      Slide5

      This final example extends the previous example to be a three-state monitor (Healthy/Warning/Critical):

      Slide6

      Obviously, it can get fairly complicated.  On the other hand, it is very conducive to good design habits: making small, focused bits of functionality that are then re-used where they are needed.

      Next in part 3, we will examine these DataItems a bit more closely.

      VJD/<><

      (Links to parts: Part 1, Part 2, Part 3, Part 4)

      Posted by vdipippo | with no comments

      Workflow Tracer 3 of 4: Data Item Primer

      (Links to parts: Part 1, Part 2, Part 3, Part 4)

      Now that we have discussed how workflows are the heart and soul of OpsMgr and modules are the building blocks of workflows, let’s discuss what runs through all of this plumbing: Data Items.

      A data item is defined in a management pack via a DataType definition in the TypeDefinitions section. These DataTypes follow an inheritance model such that all data items eventually derive from System.BaseData. Data types are a little tricky to investigate, as their definition and XML format is tucked away in their implementation, which is always in native or managed code. Outside of that code, however, they are represented as XML document fragments. An example would be:

      <DataItem time=”…” sourceHealthServiceId=”…” type=”System.BaseData” />

      In fact, this is all that’s needed for the basic Data Item. It is a single node with three required attributes: the time it was created, the source health service id, and the type of data item it is. The remainder of the schema is defined by the specific DataType and is buried inside the native or managed code that defines it. For instance, if we were so inclined, we could create a DataType called “My.Data.Item”. It could possibly look something like this:

      <DataItem time=”…” sourceHealthServiceId=”…” type=”My.Data.Item”>
      
          <MyProps>
      
              <MyProp1>Value1</MyProp1>
      
          </MyProps>
      
      </DataItem>

      As you can probably guess, all children of the DataItem element are subject to the specific requirements of the DataType set forth in the .DLL that implements the data type. You can see, then, how a data source or probe starts a workflow off: it retrieves some data externally and wraps it in some type of data item. A property bag is a common one for script monitors. A discovery data item is common for discoveries, naturally. A performance data item is common for scripted performance collectors. The list goes on.

      Incidentally, you can probably now see where the XPath expressions come from that we use in monitors, etc.: DataItem/Property[@Name="Status"] would correspond to something like this:

      <DataItem time=”…” sourceHealthServiceId=”…” type=”System.PropertyBagData”>
      
          <Property Name=”Status”>SomeStatus</Property>
      
      </DataItem>

      …and would resolve to “SomeStatus.” Since these XPath expressions are used so often in monitoring, alerting, scripting, and just about everywhere else, knowledge of the XML representation of a DataItem of any given DataType is extremely helpful.

      At this point, we can also return to our discussion of module types and give another clue as to the differentiation between them:

      • DataSource modules have only an output type. They do not have an input.
      • ProbeAction modules have a single input and a single output. They could just simply be written to manipulate the input to produce the output, but they were intended to access an external resource, based on the input, that will lead to what becomes the output. There are also special types of ProbeAction modules that have their TriggerOnly attribute set to true.  They do not have an input DataItem.
      • ConditionDetection modules have multiple inputs and a single output. The inputs are specified as a sequence of InputTypes, but often there is only one in the sequence.
      • WriteAction modules always have an input but only optionally have an output. The output is usually the result of the write action, not a continuation of the workflow.

      In summary:

      • DataItems are the units of work that are processed by modules as part of a workflow.
      • They are defined as DataType elements under the TypeDefinitions element of a management pack.
      • They follow an inheritance model, deriving eventually from System.BaseData.
      • They are implemented in native or managed code, where their format and XML representation are defined.

      Understanding the different types of data items involved in a workflow and their respective XML representation is the definitive prerequisite to understanding how to piece modules together to construct workflows for your own management pack.

      This essentially brings us to the central point of this series: the Workflow Tracer module that I will present in the last post in this series part 4.

      VJD/<><

      (Links to parts: Part 1, Part 2, Part 3, Part 4)

      Posted by vdipippo | with no comments

      Workflow Tracer 4 of 4: The Tracer Module

      (Links to parts: Part 1, Part 2, Part 3, Part 4)

      In the previous three posts, I discussed workflows, modules, and data items. I also stated that understanding the data items involved in various module interactions and their respective XML representations is sorely needed. You can accomplish this by digging though the system MPs and then through the disassembly of managed code, but that’s not a recommended course of action or a reasonable requirement. Also, beyond the general structure of the various types of data items, it is also important to see the actual contents on occasion. If you have a workflow that never produces output even though it appears as if the conditions, etc. in the workflow should result in something, a window into the values that are being processed by a particular module at a particular time would also be very helpful. This is standard fare for a solid debugging process supported by the platform’s tools. OpsMgr, unfortunately, does not yet have a good MP/workflow debugging facility.

      I have written a managed code module and an associated management pack that helps with this issue. The theory of operation of the module is simple enough: it is a condition detection module that receives a data item and then passes it on to the next module in the workflow unchanged. The only action it takes is to log the data item passed to it to the Application Event Log.

      Using this module is simple.

      It must first be installed. First, copy the .dll to the OpsMgr installation directory on the computer(s) on which you intend to run the trace. This will likely be C:\Program Files\System Center Operations Manager 2007\. Then, import the sealed MP downloadable below. Also, I’m assuming this will be done in a test lab. That’s my recommendation, anyway…

      Second, it must be incorporated into the management pack in which the module you are tracing lives. It must be inserted into the module sequence of the composite module you are attempting to trace.

      First, add a reference to it in your management pack:

      <Reference Alias=”f24″>
      
          <ID>com.focus24.Scom.Modules</ID>
      
          <Version>24.1.0.230</Version>
      
          <PublicKeyToken>5be9fb627d5adfbf</PublicKeyToken>
      
      </Reference>

      Next, add its required elements to your composite module’s Configuration element:

      <Configuration>
      
          …(your other stuff)…
      
          <xsd:element xmlns:xsd=”http://www.w3.org/2001/XMLSchemaname=”TraceElement” type=”xsd:string” />
      
          <xsd:element xmlns:xsd=”http://www.w3.org/2001/XMLSchemaname=”TraceTarget” type=”xsd:string” />
      
      </Configuration>

      There is one more element that we need to specify for the tracer module, but it will not be under the Configuration element of your module. Instead, this will be specified directly when we add it to the MemberModules element. This is because you may want to insert the module into the workflow multiple times, to trace a DataItem through different stages of the workflow.

      Therefore, next, add it any number of times under the MemberModules element, with the corresponding ordering under the Composition element:

      <MemberModules>
      
          <DataSource ID=”DS1″>
      
              …your data source…
      
          </DataSource>
      
          <ConditionDetection ID=”Trace1″ TypeID=”f24!com.focus24.Scom.Modules.WorkflowTracer”>
      
              <TraceElement>$Config/TraceElement$</TraceElement>
      
              <TraceTarget>$Config/TraceTarget$</TraceTarget>
      
              <TraceStage>Text to Identify Stage, like “Data Source to Exp Filter”</TraceStage>
      
          </ConditionDetection>
      
          <ConditionDetection ID=”Filter1″>
      
              …your expression filter…
      
          </ConditionDetection>
      
          <ConditionDetection ID=”Trace2″ TypeID=”f24!com.focus24.Scom.Modules.WorkflowTracer”>
      
              <TraceElement>$Config/TraceElement$</TraceElement>
      
              <TraceTarget>$Config/TraceTarget$</TraceTarget>
      
              <TraceStage>Exp Filter to Discovery Mapper</TraceStage>
      
          </ConditionDetection>
      
          <ConditionDetection ID=”Mapper1″>
      
              …your discovery mapper…
      
          </ConditionDetection>
      
          <ConditionDetection ID=”Trace3″ TypeID=”f24!com.focus24.Scom.Modules.WorkflowTracer”>
      
              <TraceElement>$Config/TraceElement$</TraceElement>
      
              <TraceTarget>$Config/TraceTarget$</TraceTarget>
      
              <TraceStage>Discovery Mapper to Final Output</TraceStage>
      
          </ConditionDetection>
      
      </MemberModules>
      
      <Composition>
      
          <Node ID=”Trace3″>
      
              <Node ID=”Mapper1″>
      
                  <Node ID=”Trace2″>
      
                      <Node ID=”Filter1″>
      
                          <Node ID=”Trace1″>
      
                              <Node ID=”DS1″/>
      
                          </Node>
      
                      </Node>
      
                  </Node>
      
              </Node>
      
          </Node>
      
      </Composition>

      …and that’s about it. When the management pack is next imported, the new workflow will begin running. Check the Operations Manager log for any module errors first. Then, check the Application log for the workflow trace events. Here is a sample:

      I sincerely hope you find this module useful. Please let me know if you encounter any issues or have any other feedback on this module or this series.

      VJD/<><

      Download:

      USE THIS IN A TEST ENVIRONMENT ONLY.

      com.focus24.Scom.Modules_build231.zip

      (Links to parts: Part 1, Part 2, Part 3, Part 4)

      Posted by vdipippo | with no comments

      ConfigMgr MP Policy Checker

      When client policy is created in ConfigMgr, it is distributed to clients via Management Points.  Client policy can be anything from client agent settings, site-wide settings that affect client activity (such as communication ports), and, most commonly, advertisements.  The mechanics that happen behind-the-scenes to get the policy to the client is, generally, as follows:

      • The policy is created at the server, either by an SDK/Scripting application or the console.
      • The policy in the database is compared against the site assignments, collection assignments, etc., to create effective policy assignments (i.e. which policy is applicable to which clients).
      • The policy and assignments are replicated down to all child sites and MPs.
      • When clients check for new or updated policy, they receive it in the form of an HTTP data pull from the MP.
      • The HTTP payload is an XML document that wraps MOF data.
      • The MOF information is unwrapped and becomes new or updated instances of ConfigMgr WMI classes.
      • The ConfigMgr client's activities are essentially driven by instances of those WMI classes.

      In a large environment, especially one with rate-limited addresses between sites, the part that can be problematic to research is the replication piece.  The replication of policy can back up behind other jobs.  Also, network issues can cause some policy instructions to be discarded.  This can cause a situation where the policy looks fine at the console but is not being picked up by clients.  The solution to this type of problem is usually fairly simple: a quick change to the policy will cause it to replicate again.  There's usually some change that doesn't fundamentally change the policy that can be made.  Besides, it can be changed back after the problem has corrected itself.

      There is a tool in the old SMS Toolkit 2 that allows you to connect to an MP to pull policy, but it is a manual process and is very daunting to use when trying to ascertain the extent of an issue or the readiness of policy in a large environment.  The latter is an important sanity check that I have always found useful before a large deployment, etc.

      To facilitate this testing, I have written a tool that will allow you to select a policy and then check that policy via HTTP pulls to all of the MPs.  The tool is multi-threaded for performance and works well in large environment.  The use of this tool should be fairly straight-forward.  Here are the screen shots (with certain identifying parts blacked out):

      First, enter the central primary server's FQDN and database name, and then click "Load Policies."

      image1

      The policies will be listed in alphabetical order by policy ID.

      image2

      Next, select a policy by clicking on it in the grid.  This puts the Policy ID in the Pattern field and changes the button to "Check This Policy."

      image3

      Clicking on "Check This Policy" starts the HTTP checks against every MP.  The checks happen through 5 independent threads, which is a number that is hard-coded into the tool.  This seems to be a good balance of concurrence and performance.  Each Status cell will be a yellow Req(thread#), a red Failed, or a green Passed.  Details for failed requests will be shown in the last column (not shown here, but in the grid to the right).  One more status is a yellow Aborted, which will show if you stop the checks with active threads.  When all of the checks are done, you must click Stop Checks to reset the tool.

      image4

      As always, there are caveats and unfinished business with the tool:

      • The tool connects to SQL with Integrated Security.  It does remember the last FQDN and database name you've used, but no SQL authentication is possible at this point.
      • The HTTP checks do not use an FQDN.  They use the NetBIOS server names from the SQL database.  This should not be a problem if the tool is run on a system with the necessary DNS suffix settings.
      • The initial policies that you will see are all GUIDs.  While some of these will be present on every MP, many are specific to a particular primary site, secondary site, etc.  Though they appear in the database, they will fail to verify on all MPs.  This is normal and depends on the policy the GUID represents.  The screen shot above shows advertisement policies (ADVERT-PACKAGE-PROGRAM).  These are the ones that are most useful to check.
      • There are some cases where older policies have version mismatches on some MPs that will cause them to fail the check, but are not problems.  This is most often because a policy existed on the system before a new site was added.  The new site's version numbers may not match the older sites' version numbers.  The tool does not handle this yet, but for most infrastructures, this isn't a common occurrence.
      • You always have to click Stop Checks to restart the test.  The functionality missing is another thread that frees the UI but keeps watch over the other threads and resets the UI when they end.  It seems like a bunch of work to avoid a single click, even though it's a definite piece of unfinished business.

      Please let me know your experience with this tool.  I find it extremely useful for testing whether advertisement policy is out to all MPs, especially.  This is another tool that we have used extensively with all of our ConfigMgr customers.  I am eager to see how it fairs "in the wild."

      You can download build 1073 here.

      Enjoy!

      Vin/<><

      Posted by vdipippo | with no comments

      ConfigMgr Collection Query Rule Embedded Reference Checker

      Abstract

      I have written a tool to check the embedded references in ConfigMgr query-based collection rules.

       

      Background

      A very common practice in designing query rules for ConfigMgr collections is to use the SMS_FullCollectionMembership WMI class to effect the inclusion or exclusion of members of other collections into the collection for which the query rule is being written.  There are numerous examples and discussions about this practice on other blogs, but briefly, here's an involved scenario with the example query that meets the need:

      • You have some generic collections:
        • You have a collection that you design that uses the software inventory to include all systems on which .NET Framework 2.0 is installed.  Let's say this is collection ID LAB00001.
        • You have another collection that does something similar for MSXML 2.0, collection ID LAB00002.
        • You have a collection that contains any system with less than 1G of memory installed, collection ID LAB00003.
        • You have a collection that contains any system that is under the control of the EMEA desktop group, collection ID LAB00004.
      • You need to create a collection for a software roll-out that the EMEA desktop group can manage, but that will not allow them to advertise to anyone that does not meet the requirements.  You might have guessed it: .NET 2.0, MSXML 2.0, at least 1G of memory, and within their locus of control.
      • Instead of duplicating the effort in each collection, you want to re-use the existing collections.
      • This also is a best practice because if you decide to refine the queries that drive the membership of any of those collections, you would only need to update it in one place.
      • You consider using the "Limit Collection Membership" feature, but realize that (a) this can only link the new collections' membership to 1 other collection where here we need 4, and (b) you cannot use this to effect one of the requirements: excluding the membership of another collection.
      • To solve this issue, you create two collections.  To one collection (e.g. LAB00005) you grant the EMEA team rights.  To the other collection, you advertise the software and use the following query rule that uses the SMS_FullCollectionMembership WMI class:

      SELECT * FROM SMS_R_System

      WHERE

      AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00001")

      AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00002")

      AND ResourceID NOT IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00003")

      AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00004")

      AND ResourceID IN (SELECT ResourceID FROM SMS_FullCollectionMembership WHERE CollectionID="LAB00005")

      • This produces a collection whose members have .NET 2.0, MSXML 2.0, are not in the <1G memory collection, are in the collection of systems under EMEA control, and are in the collection EMEA has been provided to control the distribution of this software release.
      • Note in all of these cases, you could have used the WMI class that is created specifically for each collection.  I don't think this is the common practice, but it achieves the same result.  This is important to point out, as the point comes up later in this post.
        • (i.e. the first sub-select could have been SELECT ResourceID FROM SMS_CM_RES_COLL_LAB00001).

       

      The Problem

      While this is a powerful technique and essentially something you can't do without in any active ConfigMgr environment, it is difficult to manage.  With a large number of collections that utilize this and the certain change in what is required for the types of collections I mentioned above as "generic," keeping track of these references can be difficult.  The purpose of a collection may change (e.g. .NET 2.0 becomes .NET 2.0 or higher, where something that required the collection when it was only .NET 2.0 is not compatible with >2.0)  The name may change with it.  A collection may be deleted and re-created for any number of reasons, which will not cause any alarm as embedded references are not tracked by the ConfigMgr Admin UI.

      In spite of these issues, there is no facility for obtaining an overview of query rules with these references or, even more pressing, to decode the references to check that they still point to an existing, appropriate collection.

       

      The Tool

      I have written a tool that provides for both of these requirements: an overview and a reference checker.  The tool uses integrated SQL security to access the database views directly.  It is a console application that is invoked with two command line parameters: the server name and the SMS_??? database name.

      The tool enumerates all existing site codes, collections, and query rules on that server.  It then uses regular expressions to detect probable collection references (i.e. any site code followed by 5 hex digits).  There is also a special case for the built-in collections, which do not correspond to a site code (starting instead with SMS) and do not conform to the strict standard for user-created collections (e.g. SMSDM001, SMS000GS, etc.)  Finally, it cross-references those collection IDs to the actual list of collections and includes them by ID and name for review.  Any collection ID that is found in a rule but not in the collections enumerated from the server is reported as "**** Unknown ****".  You can search the output for this to identify bad references.

       

      Here's an example run:

      CheckCollRef1

       

      ...and the output file...

      CheckCollRef2

       

      I think the tool and the output should be fairly self-explanatory from there.

      Notes on the limitations of this tool:

      • We have a fairly large number of collections that utilize references at many client sites.  I've used this tool against all of them with no problems.  That having been said, there is still a possibility that some of you may have rules that fool the regular expressions I am using.  That highlights the limitation of pattern matching versus parsing, only the former of which this tool does.  I would be interested to see any instances of this, so please let me know.  The most likely candidate would be if your system or user naming convention is close (or identical) to the collection ID patterns in ConfigMgr.  If that were the case, collection query rules that reference such names could be mistaken for collection references.
      • This will not catch more generic references, like WHERE CollectionID LIKE "LAB0001%".  I'm guessing that if anyone is using these, it is a rare case.
      • It also will not catch references that are intrinsically broken, like an invalid Collection ID (e.g. LAB000X0 or LAB00000121).
      • It will not catch references to any Collection ID that does not correspond to any known site code.  Besides obvious typos, I can envision a valid scenario where this would be needed, but it would be very rare:
        • You have a child primary site 999.
        • You create a collection at that site (999?????).
        • You use that collection ID in a reference in a query rule of a collection built at a parent site.  This reference would return no ResourceIDs except when run on that particular child primary and any of its children.  This is where I say it's probably a very, very rare case.
        • You then delete site 999.
        • Any references to 999 collections are now broken, but since there's no longer a reference to site 999 in the database, the tool will not catch these.
      • It will not catch references to collections that are specified using the SMS_CM_RES_COLL_<CollectionID> classes.  I actually had this in the tool, but the underlying query that ConfigMgr generates uses these classes to effect the "Limit Collection Membership To" functionality.  This would have meant that the tool would not only identify query rules that had embedded references but also query rules for collections with the "Limit To" collection ID specified.  Since the latter is much more common and also double-checked by the Admin UI, I left their detection out of the tool.  As I mentioned above, I believe the most common practice is to use the SMS_FullCollectionMembership class, so this shouldn't pose much of an issue.
      • It will not catch references to other named objects such as packages, advertisements, etc.  While technically possible to use these in a WQL query, I'm guessing these, too, are rare.

      All of these are probably rare.  If you have an appreciable number of instances of any of these exceptions, please let me know.  I'd be interested in working to evolve the tool to handle them accordingly.  Overall, though, I think you will find this to be a very useful tool.

       

       You can download build 15 of CheckCollRef here.

       

      VJD/<><

      Posted by vdipippo | with no comments
      Filed under: ,

      ConfigMgr Package Distribution Health Browser

      Anyone that manages a ConfigMgr environment with any depth to its hierarchy usually spends a fair amount of time waiting for minijobs to replicate changes down the hierarchy.  Though not immediate, policy such as site control changes, advertisements, client agent configuration (inventory, discovery, etc.), and so forth do not usually keep you waiting too long to verify the success or failure of the replication down to the bottom of the tree.  One notable exception to this is package replication.  Especially for large packages, it is not uncommon for replication to take several hours, particularly if senders are configured with rate limits.  In an environment with a large number of packages and/or packages that update frequently, it can be quite an organizational challenge to find, check, and remedy replication issues on a regular basis.  This is even more true if you manage the ConfigMgr infrastructure but others create and maintain packages.  While the console is quite functional in checking the distribution status of packages in general, I wrote a tool that helps administrators zero in on package replication issues much more quickly.  First, it accesses the associated views in the database directly, which makes it much faster to get at this information than the console, which is a front-end to the ConfigMgr WMI provider.  Second, it offers some sorting and filtering options that highlight the replication issues much more efficiently than reviewing the total status for every package individually.

      Here's the initial screen:

      pdhb1

      The File menu is simple enough.  It contains Connect and Exit.  If you click the Load button without using File | Connect, it also shows you the connect parameters (shown below).  The interface is also simple enough:

      • The left-hand pane (1) shows what you are investigating, controlled by the radio buttons (3).  If you investigate by Package (the default), it will show you packages on the left.  If you investigate by DP, it will show DPs on the left.
      • Clicking on anything in the left-hand pane shows the assignments that are in any of the selected assignment statuses in the right-hand pane (2), which are controlled by the three checkboxes on the right (5).
        • The default is to show only abnormal assignments.
        • If you are investigating by Package, clicking on a Package on the left shows all DPs that are in the selected statuses on the right.
        • If you are investigating by DP, clicking on a DP on the left shows any packages on that DP that are in the selected statuses on the right.
        • Normal and Abnormal are fairly self-explanatory, but "Missing" assignments may not be.  It is a quick way to see where packages are not assigned.  This is helpful for confirming whether the desired packages are on any new DPs.
      • The checkboxes on the left (4) determine the order of the list boxes.
        • The sort order applies notwithstanding the side on which the particular item is located.
        • Sorting DPs by Site Code is the default.  If this is un-checked, the DPs are sorted by server name.
        • Sorting Packages by Name is not the default.  The default is to sort packages by Package ID.
      • The final check box determines whether the tool pre-filters the left-hand list.  This is one of the central characteristics of this tool.
        • This is checked by default.
        • With this checked, only items in the left-hand list that have at least one assignment that would show in the right-hand list will be shown.
        • With this un-checked, all items in the left-hand list are shown.  Here, it is likely that you will click on something in the left-hand list and see nothing in the right-hand list.

      So, to wrap-up, let me give you a couple of scenarios for the behavior of this tool:

      • With the defaults, loading from a DB will show packages in the left-hand list, sorted by Package ID, that have at least one DP that has an abnormal status.  Clicking on a Package ID would show all DPs to which that package is assigned with that abnormal status, sorted by Site Code.
      • If you change to Investigate by DP, it will show DPs in the left-hand list, sorted by Site Code, that have at least one package that has an abnormal status.  Clicking on a DP would show all Packages assigned to that DP with that abnormal status, sorted by Package ID.
      • If you switch back to Investigate by Package, un-check Show Abnormal... and check Show Missing..., you will see a list of packages that are not assigned to at least one DP.  This is usually normal, but with some packages you know need to be global, you can quickly see the issue and to which DP they need to be assigned.
      • With any of these, if you un-check Pre-Filter..., you will see all packages or all DPs in the left-hand list.
      • Etcetera, etcetera, etcetera!

      Here's a snapshot of the tool in action at one of our customers.  I've blacked out the identifying information.

      pdhb3

      As promised, here is the connect box, which should be self-explanatory:

      pdhb2

      Please let me know what you think, if you encounter any issues with this tool, or have any suggestions.

      You can download build 16 here.

      Enjoy!

      Vin/<><

      Posted by vdipippo | 1 comment(s)
      Filed under: ,

      OpsMgr DW Timeouts and TableCount Utility

      I recently fixed a problem at one of our clients that had a persistent problem with the DW write actions timing out.  After running the profiler and examining the underlying SQL, I found a few tables that were taking an inordinate amount of time to do simple operations.  This seemed to be a clear case of a fragmentation problem.  The biggest issue was that tables like ManagementPackVersion with 200 rows were taking over 3 minutes just to do a SELECT COUNT(*).  The remedy for these tables was to rebuild the indices using ALTER INDEX ALL ON <tableName> REBUILD.  Please check out the options and caveats regarding that command in the SQL documentation.

      After fixing the tables that were causing the immediate issues, I wrote a small utility (TableCount.exe) that does a table count on all the tables in a database and reports the number of milliseconds that the operation takes on each.  This identified a few other tables that needed to have their indices rebuilt as well.  The utility takes two parameters: the server name and the database name.  It assumes integrated security at this point.  Here's some sample output (note the named instance of SQL):

      C:\>TableCount OM07DB\TESTDWDB OperationsManagerDW

      [Alert].[AlertResolutionState_135D9B096BF24191AD5E04D6C100DA4F]: 88738 row(s) retrieved in 15 ms
      [dbo].[ManagementPackVersion]: 208 row(s) retrieved in 0 ms
      [dbo].[StandardDatasetAggregationHistory]: 1589 row(s) retrieved in 0 ms
      [Event].[Event_0A191F9853CD4927B76640AB8A8157F0]: 1801878 row(s) retrieved in 62 ms

      ...

      C:\>TableCount OM07DB\TESTOPDB OperationsManager

      [dbo].[DomainTableStatisticsUpdateHistory]: 1245 row(s) retrieved in 0 ms
      [dbo].[MT_Computer]: 286 row(s) retrieved in 0 ms
      [dbo].[MT_NetworkAdapter_0]: 929 row(s) retrieved in 0 ms
      [dbo].[PerformanceData_27]: 1672410 row(s) retrieved in 312 ms
      [dbo].[PerformanceData_28]: 1662302 row(s) retrieved in 218 ms
      [dbo].[PerformanceData_29]: 1755413 row(s) retrieved in 594 ms
      [dbo].[PerformanceData_30]: 1786974 row(s) retrieved in 359 ms
      [dbo].[PerformanceSignatureData]: 193512 row(s) retrieved in 31 ms
      [dbo].[PublisherMessages]: 1251885 row(s) retrieved in 156 ms
      [dbo].[StateChangeEvent]: 338924 row(s) retrieved in 31 ms

      ...

      There are other forensic techniques that you will find useful in this situation, especially related to querying the DMVs in SQL to see the actual issues, but those are well-documented elsewhere.  Also, the SELECT COUNT(*) operation is not sufficient for detecting every form of this issue.  None-the-less, this utility is pretty handy to get the counts and the count times in a single, quick snapshot.  It certainly found the problems in this customer's database, plus one other so far...

      Feedback, as always, is most welcomed.

      You can download build 2 of TableCount here.

      VJD/<><

      Posted by vdipippo | with no comments

      Returning to the Blog!

      In looking back at my blog, I can't believe it has been since July of 2008 that I have had a post.  I assure you that this is against my earnest desire to do so.  As with anything, I don't think I've successfully integrated blogging into my routine.  I suppose at 6 months that is an understatement!

      That's about all I have to add to this particular post!  I will instead endeavor to post content from the work we do in System Center more frequently.

      Thanks to all for the positive feedback we continue to get on the Lineage Explorer and the Workflow Primer module.

      I hope to hear from you all again soon...

      Vin/<><

      Posted by vdipippo | with no comments
      Filed under:

      Build 316 of Lineage Explorer

      I've had some time to get a new build of the Lineage Explorer out.  Build 316 is the latest.

      If you aren't already familiar with this tool, check out the original post: OpsMgr Lineage Explorer.

      Build 316 adds the following functionality:

      1.  There's a new status bar that will show the load progress for the OpsMgr environment.  This is useful for environments with a large number of management packs.  This also doubles as an indication of which MP element type is currently populating the tree view.

      2. Data Types are now processed, so you can explore their lineage as well.  Also, Class Types are selected for display first.  Build 227 selected Module Types by default.  Finally, there's a Filter by Example feature, though you can only turn it off using the View menu.  More on that later.

      3. I've added a property grid for all MP element types, and gone a little further into which properties are included.  Build 227 only showed a few properties of module types only.

      4. There is a Filter by Example feature that allows you to right-click on any node in the tree view and select it to be the filter (by example).  The two options are Any and Direct.  "Any" will filter the tree view, including only those MP Elements that have the element you selected anywhere in their lineage.  "Direct" includes only those MP Elements that are direct descendants of the element you select.

      Here's another screen shot of what the tree view looks like after selecting "Filter by Example (Any)" on System.CommandExecuter.  Note that the View menu now shows the Filter by Example with a check mark.  This is to allow you to turn it off.  Also note that if I had selected "Filter by Example (Direct)," all of the elements in this screens shot would not have been visible because they are descendants of descendants of System.CommandExecuter, not direct descendants of System.CommandExecuter.  If this screen shot included Microsoft.Windows.ScriptWriteAction, you would have seen that it would have been included.  It would also be included if "Filter by Example (Direct)" had been selected, because it is a direct descendant of System.CommandExecuter.

      There are more improvements in the works.  Most notable is the ability to view the lineage of monitor types.  This one is a little trickier, because each of the regular detections has its own lineage.  That makes it a single MP Element type with multiple lineages.

      After that, it will have complete coverage of the <TypeDefinitions> section.  Then, it's on to the <Monitoring> section to allow the lineage of the components of monitoring MP Elements to be displayed.  This functionality will be similar to the monitor type handling: an MP Element with multiple lineages.

      The other to-dos from the original post are still on the list.

      Version 316 can be downloaded here: LineageExplorer316.zip.

      Thanks for the feedback thus far.  Keep it coming!

      VJD/<><

      Posted by vdipippo | 2 comment(s)
      Filed under: ,

      OpsMgr Scripting: WYWINNWYG

      "What You Write Is Not Necessarily What You Get"

      Abstract

      OpsMgr 2007 has a fairly substantial library of discovery module types and monitor types that ship with it.  Even with that library, it seems that the single most important extensibility point in OpsMgr today is the ability to use custom scripts to perform discovery and monitoring tasks.  As I've developed scripts for OpsMgr, I have learned a great deal about how OpsMgr processes scripts.  This article describes some important concepts regarding how scripts are processed by OpsMgr.

      Besides the Execution Environment...

      There is an important distinction between the execution environment of a script and the production of the script itself.

      The execution environment is fairly straight-forward: VBScript or JScript that runs through the script processor.  In fact, if you trace back the various script modules, you will find that they all derive from a handful of native modules, all of which essentially execute cscript.exe.  All of the capabilities of that scripting engine and the facilities it can access are at your disposal (ADSI, WMI, ADO, the remainder of what's in COM, etc.)  We also use a COM object in our OpsMgr scripts that is on every system that has the OpsMgr Agent installed, namely "MOM.ScriptAPI".  This essentially provides functionality to generate well-formed XML DataItems with the appropriate format, attributes, etc.  We'll see that later.  While that's a simplification of what "MOM.ScriptAPI" does, the point is that it is not part of some custom script execution environment that runs in-process with OpsMgr.  It is just another COM object that the usual cscript.exe environment instantiates at our request.

      The production of the script is quite a bit more interesting, so I'll spend the majority of this article addressing it.

      Sample 1: Expansion of Embedded Selectors

      First, let's define a simple script, which we'll put inside a two-state script unit monitor.  This script targets Microsoft.Windows.NTService.  I've also created two service classes from the MP templates in OpsMgr, one for Browser and one for SNMP.  This virtually guarantees that OpsMgr will discover at least one on any given system.  My test system has both.

      Figure 1: The script itself.  (FYI - The $...$ selector that is not entirely in view is "$Target/Host/Property[Type="System!System.Entity"]/DisplayName$")

      Figures 2a and 2b: The unhealthy and health expressions.  These are somewhat unrelated to the topic of this article, but I thought I'd add this for general purposes.  It is useful to see this and to note that any time you see wizard pages that resemble these, you are using the UI to express the configuration of the System.ExpressionFilter condition detection module.  In this case, I've defined a monitor that always returns healthy, regardless of what the script does.  You can experiment with this further, but keep in mind that you can specify just about anything for either side of each expression (i.e. row on the wizard page), including the entire gamut of $...$ selectors.  ...end aside...

      Returning to the main topic, anyone that is a script developer has a particular paradigm in mind.  The script is written and runs as written.  If you've ever seen constructions that use the $...$ selectors inline (e.g. SomeVar = "$MPElement...$"), you might have assumed that there was something special about the script execution environment that somehow resolves those.  The reality is that those resolutions are done before the script is ever executed.  Therefore, we have a very tricky paradigm on our hands: the execution environment is completely standard, but the script body itself is "refined" by OpsMgr before it is embodied in an actual file and run.

      Let's investigate this.  Our example script does nothing of interest except return a static property bag.  That will be somewhat useful later, but for now, the really interesting part is not what the script does, but what the script becomes.

      To find the actual file that is eventually run, we need to get to the Health Service State directory and find our script.  We will actually find two versions of the script, because it is generated once for each target against which the script will run (in this case, Browser and SNMP).

      Figure 3a: Finding the script location.

      Figure 3b: What's finally in the first instance.

      Figure 3c: What's finally in the second instance.

      As you can see from the figures, everything that relates to the execution context of the script is not present in the execution environment per se, but baked into the script itself as it is readied to run.  This includes information about the target, the MPElement on whose behalf the script is being invoked, any MPElement we may need (which I grant you is usually only required for discovery scripts), and information about the target's host (which is also information from the $Target...$ selector family).

      It is useful to understand this mechanism for two reasons:

      1. You do not need to put everything into the arguments of the script that may seem necessary.  The script will be customized per target.  This opens up a wide array of information that can be used in the script without worrying about the command line formatting.
      2. It is contrary to our normal thought process for scripting.  Understanding it provides a better picture of how your script will be invoked and it also reveals how these OpsMgr-specialty-laden scripts can be debugged.

      In a practice, I always include a comment line at the beginning of my scripts that includes the first four items in our sample script: the basic identification of the Target and MPElement.  I find that to be extremely useful in tracking down script errors.

      Sample 2: More Embedded Selectors: From <Configuration>, Including Overrides

      Continuing our example, let's set some overrides that will make the two instances unique with respect to their configuration as well as their target.

      Figure 4a: Overrides targeted to the specific service object Browser (on the test system).

      Figure 4b: Overrides targeted to the specific service object SNMP (on the test system).

      I also did two other things:

      1. I added a few additional lines to the script
        ' Some Config Values
        '   Arguments: $Config/Arguments$
        '   Interval Seconds: $Config/IntervalSeconds$
        ' A service-specific property:
        '   $Target/Property[Type="MicrosoftSystemCenterNTServiceLibrary!Microsoft.SystemCenter.NTService"]/ServiceProcessName$
      2. I created another monitor that is almost identical to the first one.  The script itself is actually identical, but this second monitor defines the script in a UnitMonitorType and uses that custom UnitMonitorType in the UnitMonitor.  By default, script monitors created by the UI use a built-in UnitMonitorType, which means that the script body itself is defined in the UnitMonitor.  There is a subtle difference that we will discuss shortly.

      Once the new configuration becomes active, we follow the same pattern to find our scripts and see what's in them.

      Figure 5a: Finding the script location.  There will be four now: the original and the UnitMonitorType variant (UMT) for both instances.

      Figure 5b and 5c: The contents of the two updated but UI-defined scripts.

      Note that the additional property of the target came through fine, but the $Config$ selectors were unchanged.  This means that using a UI-defined script, the $Config$ selectors are not expanded.  This applies to both the "natural" configuration items and any overrides.

      What's going on here?  Let's check out UMT version.

      Figure 5d and 5e: The contents of the two updated, UMT-defined scripts.

      Now, that's what we're looking for.  I must admit that while I have a reasonable explanation for this, I cannot give you a definitive answer.  I'll leave that for a developer that knows the exact inner workings of OpsMgr, but it appears that the $Config$ selectors are only expanded when module and monitor types are assembled for use in a workflow.  Think of it this way: when something defined in the <TypeDefinitions> section of an MP is assembled to be used in the <Monitoring> section of an MP, $Config$ selectors are expanded.  The UI-defined monitor defines the script body in the <Monitoring> section.  It uses a built-in module that itself defines the script body as whatever is in the <Configuration> of the actual monitor $MPElement$, which is defined in the <Monitoring> section.  Therefore, our UI-defined script body is never really subjected to assembly in that way.  The UMT version, with the script defined in the UMT (under <TypeDefinitions>) is assembled for use in the corresponding monitor, so we see the expansion for it.  Both are then assembled for use against the various targets that exist, which is why everything else expands in either case.

      Important Notes About Expansion

      There are a few additional important notes on this expansion business:

      1. You must use a script argument for anything that changes between invocations of the script.  The script is created only when a new configuration becomes active.  That only happens when management packs change.  There's a myriad of things that cause a configuration change, such as group membership rules, new monitors, new subscriptions, overrides, etc., but there are an equally large number of changes to the environment that do not cause a configuration change.  Some of these include group membership updates that are the result of a dynamic rule, target property changes that are the result of discovery runs, etc.  Suffice it to say that a script can run a multitude of times under the same configuration.  This is particularly important for:
        1. $Data$ selectors.  There's no way to expand a $Data$ selector when the script instance is created.  These are pulled from the contents of the <DataItem> that is passed into the module executing the script and are therefore only available as script arguments.  If you've never seen these, they're pretty useful.  For any timed script,  my favorite is $Data/@time$.  You can pass that as an argument to have the trigger time of the monitor at your disposal, in the appropriate OpsMgr <DataItem> attribute format.
        2. $Target$ selectors.  We saw that these work wonderfully as embedded selectors, but you must be careful with them.  They are expanded when the script is created, which is when a new configuration becomes active.  Some $Target$ properties might change between the time the script is created from the then-active configuration and something happens that causes a new configuration to become active (and hence the script to be re-created).  If there's a $Target$ property that can afford to be a little stale, you can embed it.  If not, pass it as an argument.
        3. $MPElement$ selectors are all tied to the configuration, so they're safe.  When they change, the configuration is changing, so the script will be re-created anyway.
      2. I have encountered some quirks, so don't think you're losing your mind if you see strange things in your script.  I suspect that some combinations of selectors embedded in scripts are not expanded properly, but I have been thus far unable to reliably reproduce the problem.  It definitely happens, so be on the lookout.
      3. The MP alias used in the selectors must be appropriate to the MP in which the monitor is being created.  If you are creating an MP yourself, that's easy to keep straight.  If not, you may not be able to predict the alias.  System.Library, for instance, is sometimes referenced as System! and sometimes as SystemLibrary<Version>!, where <Version> is whatever version of that MP was installed when the reference was made.  An easy way to determine this is to add something as an argument using the UI.  It will use the correct alias, and you can just copy it from there.

      Debugging the Expanded Script

      The final item that is relevant is how this can help debugging.  This goes back to the execution environment discussion I started with.  When the MOM.ScriptAPI COM object is used to submit property bags, performance data, discovery data, etc., it does so using STDOUT.  That's it.  Therefore, running your script from the location in which it was created (under Health Service State) works just fine.  It just prints the XML representation of the data item that would be picked up by OpsMgr if the script had been run inside its workflow.

      Figure 6: A property bag output to STDOUT after an interactive run of the script.

      Conclusion

      I hope you find this information useful.  I believe it is very helpful to understand the machinations that a script undergoes well before it even hits its execution environment.  It's also useful to have so many more data from the context of the OpsMgr configuration, target, etc. available to use.  It makes scripting that much more robust, which is a major component of OpsMgr's appeal.

      Distributed Applications Custom MP

      If you have begun to work with distributed applications, you have begun to work with one of the most impressive areas of Operations Manager. This is the facility that allows you to truly take the health model approach to the next level. Essentially, it is a graphical tool that allows you to model your infrastructure along service boundaries, as opposed to the physical boundaries to which we have always been accustomed.

      Behind the scenes, Operations Manager creates many entities in the management pack to which you save your distributed application. These are classes, instances of those classes, relationships, monitors, etc.

      I plan to blog about several facets of understanding and implementing distributed applications in the future, but one foundational element is an MP I have written that includes a new distributed application template that I think you will find useful.

      It is a completely blank distributed application that will appear in the Template list when you create a new distributed application. It is very similar to the "Blank (Advanced)" template that ships with OpsMgr, but it is truly blank. It will appear as "Empty Distributed Application" in the template list.

      There are two differences between my "Empty Distributed Application" template and the "Blank (Advanced)" template that ships with OpsMgr:

      1. The "Blank (Advanced)" template includes a dependency monitor named "Blank Distributed Application Health Roll-up". This is a curious monitor for three reasons:
        1. It is a dependency monitor whose parent is Entity Health -> Availability but it makes a dependency link to the Entity Health aggregate monitor for everything in the DA. That's somewhat redundant.  Actually, I think it would be best described as "recursively redundant," since that gives you Entity Health -> Availability -> (this monitor) -> Entity Health -> Availability -> ...
        2. The logical function of this monitor is already performed by the "All Contained Objects" dependency monitor already under Availability (hence the redundant part of my comment).
        3. This monitor is disabled.
      2. Along these same lines, my template adds a dependency monitor named "All Contained Objects" under Configuration, Performance, and Security, to match the one that is under Availability by default. These are enabled by default but can be easily overridden if desired. You may want certain DAs created from this template to be availability-only, etc., but I think the presence of the monitors will help.

      The current version of the sealed MP is 24.1.1.1010.  The MP is available for download here:

      com.focus24.Distributed.Applications1010.zip

      VJD/<><

      Posted by vdipippo | with no comments
      Filed under: , ,

      OpsMgr Lineage Explorer

      Summary
      As I've been researching management pack elements over the course of the past 18 months, one task that I find myself doing over and over again is researching the lineage of management pack elements.  With the inheritance and composite module features of the OpsMgr design, this can take you down quite a road.  I have long wanted to write a tool that makes this easier.  This is it.

      What This Tool Does
      This tool allows you to explore the lineage of OpsMgr MP elements.

      Why This Tool Is Useful
      OpsMgr is very object-oriented in its design.  As such, it includes inheritance for class types and relationship types.  Also, the workflow engine in OpsMgr is the driving force behind all discoveries, monitors, and rules.  All discoveries, monitors, and rules are based on modules that perform various functions.  All module types derive, eventually, from either managed code or native code.  One major difficulty in researching a new management pack you've loaded (or from which you intend to learn) is that module types can also be "composite" modules, which contain any number of other modules that are linked together to form a new module type.  Often, you will find a discovery that uses a data source module that is a composite of three or four modules.  Those three or four modules can themselves be composites of several modules.  Some management packs have modules that must be un-wrapped to four or five levels before you really understand the base modules that make up the actual work the workflow is doing.

      This tool loads your OpsMgr environment from the database and then allows you to inspect the class types, relationship types, and module types installed.  It allows you to expand them to see their lineage (either their parentage for class types and relationship types or in the case of module types, their composition tree down to the underlying native or managed modules).

      Future Directions
      I am currently working to extend this viewer to include:

      • MonitorType Lineage
      • Discovery DataSource Lineage
      • Discovery Target Lineage (useful for disabling the "root" discoveries for a management pack)
      • Rule Lineage

      I also plan to add more features around presenting the data:

      • DisplayStrings.  Everything is an ID right now, which is the most important item for using what you see in your own MP.
      • A view into the <Configuration/> block for modules.  This is easy to do at face value, but to make it intelligent, the schema really needs to be parsed so one can easily see how a composite module is using a member module.  This would also involve expanding the SchemaTypes involved.  Many times, this is a simple $Config$ reference, which just passes the problem up the chain; however, the implementation of many modules do not require the configuration of certain member modules to be specified.  Instead, they configure the member module directly.  This is often the case with the Expression Filter.  The bottom line is that a view into the <Configuration/> block will be introduced shortly, but it may or may not be introduced with some intelligence around parsing the schema.  At some point, however, that would be the desired end state for this part of the tool.
      • I have a function that I have not completely implemented that will allow a module to be right-clicked and selected as the filter for the entire display.  This would be very useful, for example, in determining which modules eventually use a script, publish performance data, or generate alerts.  Most management packs are intuitive, but I've seen a few write actions in rules that bury the GenerateAlert module a few levels deep, which makes it difficult to flag them as alert-generating.
      • Exception handling will improve as I get feedback and break the tool more myself.  I'm following the basic premise that I recently discussed with a friend and peer developer: I'm not catching exceptions unless I plan to do something useful with them.  This is a fairly innocuous tool at the moment, so I'm assuming that you'd like to see the stack trace as much as I do.  I'll spare everyone from the Spartan "Could not connect to database." message box for the time being.

      Getting the Tool
      The current build (227) is available for download here:

      LineageExplorer227.zip

      VJD/<><

      Posted by vdipippo | with no comments
      Filed under: ,
      More Posts Next page »