Monitoring with Orchestrator

This article will detail how to turn Orchestrator into a “poor man’s” Operations Manager by showing some of the monitoring activities available.

Background

This method is intended for those that do not already have a monitoring solution in place. This method involves creating a separate runbook for each item that you want to monitor. This is required because the monitoring activities are events and cannot accept incoming links. This means that they must initiate the runbook. These events will also keep running until you stop the runbook. If the computer stays up, the monitoring activity continues to run. If it moves beyond the monitoring step (i.e. whatever we are monitoring fails), the runbook will then quit. Fortunately, we re-execute the runbook as a final step, as I will show in my examples.

This method can also be used in conjunction with an already in-place monitoring solution because of some different Orchestrator capabilities to restart services or processes. I will cover this in more detail later.

This article will detail five separate processes – a check to see if the device is still communicating, a disk space check, monitoring and restarting a process, monitoring and restarting a service, and monitoring using WMI. For any of the runbooks to function, Windows Remote Management must enabled between the runbook server and the client device. For more information on Windows Remote Management, go here: http://technet.microsoft.com/en-us/library/dd759202.aspx#BKMK_gp (this is for servers, but the process for clients is the same).

Device Communication

This is going to be two simple runbooks. I will call mine “Monitor Device” and “Check for Device”. They will work together to monitor a device, and if Orchestrator cannot communicate with it, it will send an email.

Monitor Device

1

To start, drag the “Monitor Computer/IP” activity into your runbook (located under the “Monitoring” node”). Double click it, and enter the device name or IP in the “Computer” box. Next, change the “Trigger condition” to “The computer is not reachable”. Finally, change the “Test frequency” to fit your needs. Thirty seconds is usually a good number.

Next, drag the “Send Email” activity into your runbook. Double-click it, and fill in a subject, the recipients, and what you want the message to say. This is purely up to you as what you want to say.

2

Next, click on the “Connect” node. Fill out these settings as it applies to your environment.

I will come back for the “Check for Device” activity.

Check for Device

Now we need to build the second runbook for this alert. This runbook will wait for the device to come back online, then execute the first runbook again. I use this method because if not, I would either receive an email every 30 seconds until the device is back online, or I would have to stop the runbook.

3

As you can see, this runbook is very similar to the last one. The only difference is that in the “Monitor Computer/IP” activity, check the “The computer is reachable” option under “Trigger condition”. You can even put a “Send Email” step in the middle if you want an alert that the device is back online. For the “Monitor Device” activity, drag an “Invoke Runbook” activity (located in the “Runbook Control” node) into your runbook. Double-click it and select the “Monitor Device” runbook.

Finally, go back to the first runbook and drag an “Invoke Runbook” into your runbook. Double-click and select the “Check for Device” runbook in the “Runbook” box.

Disk Space

Next, we will monitor disk space. This will be similar to the last alert in that this will actually be two runbooks. I will call my runbooks “Check for Full Disk” and “Check for Good Disk”. These two runbooks will work together to ensure that my disks do not get full.

Check for Full Disk

4

To start, bring a “Monitor Disk Space” activity into your runbook. Double click it, and add a criteria. Now, click on X:. This allows you tell the runbook what device and drive to monitor. Fill this in appropriately. Next, click greater than and change it to less than. Next, click on the number and change it to fit your needs. Finally, click on megabytes and also change it to fit your needs. I suggest changing this value to “percent of total”. Here was it should look like:

5

You can also add additional criteria here for more drives on a device.

Next, add a “Send Email” activity and configure it like the “Monitor Device” email activity.

Just as with “Monitor Device”, we are going to skip the “Check for Good Disk” activity for now.

Check for Good Disk

6

Now, just as before, we are going to do the opposite. Drag a “Monitor Disk Space” activity into your runbook and double click it open. Configure the criteria like this:

7

This makes the runbook continue after the drive space has fallen back below 10%. Next, we need to invoke the “Check for Full Disk” runbook. To do this, drag an “Invoke Runbook” activity into your runbook, open it, and select the “Check for Full Disk” runbook. Again, you can also add a “Send Email” step in the middle if you want.

Finally, go back to the “Check for Full Disk” runbook and add the “Invoke Runbook” activity again. This time, make it execute the “Check for Good Disk” runbook.

Monitor and Restart a Process

These runbooks will be used to monitor critical processes and restart them if they fail. These runbooks can be used in conjunction with an already in-place monitoring solution because of its ability to go ahead and restart a process if it fails.

8

To begin, drag a “Monitor Process” activity into your runbook. Double-click it to open and fill out the “Properties” box. If you select the computer first, you can then click on the ellipsis (…) and select the process from that computer that you want to monitor. Next, select the “Process is stopped” button and set the frequency of your test. In this example, I am monitoring the ccmexec.exe process on a device.

9

Next, drag the “Run Program” activity (located under the “System” node) into your runbook. Double-click it to open, and fill out the “Mode” and “Details” boxes according to the program (process) you are executing.

As a side note, if you click on the “Security” node, you can run the program as another account. This may be needed if your process must execute as a particular account. Also, if the program requires administrator access to run, the runbook service account will need to be an administrator on the device (unless you elect to run the program as another user).

Now I want to be alerted if the runbook was unable to execute the program. To do that, drag a “Send Email” activity into your runbook. Fill it out according to your needs. IMPORTANT: we also need to change the link. Double click it, and click on success. Change this to “Failed”. You can also change the link color to red by clicking on Options.

10

Just as with the communication and disk space runbooks, I do not want to receive an email every 30 seconds for the same process. I need to build another runbook to monitor for when the process is reactivated after a failure, then rerun this runbook.

11

To do this, simply set the “Monitor Process” activity to check that the process is started. It will then execute the first runbook.

Invoke this runbook after the “Send Email” activity. That will prevent a crashed process from filling up your inbox.

Now what happens to the runbook if it successfully restarts the process? We need to ensure that executes again and continues to monitor our process. To do this, add another “Invoke Runbook” activity and set it to the current runbook. Link it after the “Run Program” activity.

Monitor a Service

This runbook is going to be similar to the previous runbook. It will monitor a service and restart it if it fails.

12

To begin, drag the “Monitor Service” activity into your runbook. Fill out the “Properties” box according to what you are monitoring. As with the processes, if you enter the computer name first and click on the ellipsis (…), you will get a list of all services (started, stopped, or disabled) to select from that computer. Next, check the “Service is stopped or paused” in the “Trigger when” box. Finally, check the “Restart stopped service” check box. This event will actually restart the service for you if it stops. You also have the option to set a test frequency. In this example, I am monitoring the Orchestrator Runbook Service.

13

Next, we need to add a “Send Email” step. This will send us an email if the service fails to start. Configure this as you see fit. Remember to change the link to unsuccessful like we did in the “Monitor Process” runbook.

Next, we need to prevent an email every 30 seconds until the service is back up. To do this, add another runbook and the “Monitor Service” activity to it. Fill out the “Properties” box accordingly, and just change the “Trigger when” option to “Service is started”. Next, add an “Invoke Runbook” activity that kicks off the “Monitor Service” runbook.

14

Finally, add another “Invoke Runbook” activity to the “Monitor Service” runbook and set it to the “Check for Service” runbook.

Monitoring WMI

Finally, it is worth mentioning that you can also monitor anything in WMI using a similar method. I will not go into much detail on this because the WMI repository is so vast. This ability does give the administrator the ability to monitor anything else on the device that is not a service, process, or disk space.

15

As you can see from this screenshot, the process is similar. You give Orchestrator a computer, the WMI namespace, and a WMI query to run.

Summary

While this method is not as robust as Operations Manager, it will get the job done. I hope this helps you try to get a handle on monitoring with Orchestrator.

email

Written by , Posted .
  • Five9vs

    pretty cool, another approach is to have a local script (scheduled task) call a alert report runbook when an alert condition appears.

  • Dipan Ghosh

    Hi there

    Thanks for the article, it is very helpful. I have a couple of quick questions if I may. I am relatively new to Service Manager and Orchestrator. I am trying to do a couple of things with them:

    1) I want to monitor disk space on servers. If the disk space goes over a certain threshold, I want to delete certain files, and log an incident in service manager and mark that as resolved.

    2) I want to monitor some services on servers, if the service stops, to automatically start the service and similarly log a ticket in service manager and mark that as resolved.

    I have Operations Manager installed and I can use Operations Manager to leverage the monitoring side of things. Can this be done? If so, can you please point me towards the direction of how to. I would really appreciate any assistance. Please help :(

    Regards,