in

myITforum.com

This Blog

Syndication

1E Blog

Empowering Efficient IT

Troubleshooting Failed Wakeups

Troubleshooting Failed Wakeups

I just completed a very long process of remotely assisting a prospective customer with troubleshooting why a client wasn’t waking up using their new install of the NightWatchman Management Center and 1E WakeUp (in a “stand-alone” model; no SMS/CfgMgr integration). It occurred to me that it might be a good idea to document the process and include some log snippets that illustrate the process. This may help someone else in tracking down exactly where the problem lies. This particular problem was kind of unique as you will see, but quite easy to induce if one doesn’t read the Install Guides carefully.First, let’s review the pertinent portion of the architecture first. This image is a portion of Figure 1 – The NightWatchman Management Architecture, found in The NightWatchman management Center Installation Guide.

 

I’ve added the numbered call-outs for ease of reference throughout this article.

The Scenario

The Administrator initiates an “On Demand WakeUp” of the client machine, named “Target”(#7, above).

The Assumptions

·         The WakeUp agents are all properly installed and running on all clients (multi-agent mode)
·        
The clients have all reported their inventory into the NWM MgmtCtr server’s database
·        
The client PC’s are all properly enabled for WakeOnLan support
·         All 1E product logs are set to DEBUG logging level for maximum details

o   hklm/software/1e/<productname>
o  
Name: DEBUG
o   Value: 255

The Flow (numbered to coincide with callouts, above)

1.       The Administrator initiates the On Demand Wakeup from the NWM MgmtCtr Console

 2.       the NWM MgmtCtr “Service” account passes the request to the WU Server service (installed on the same server in this scenario)

a.       confirm proper processing by the NWM Console “Service” via WMI on the WU Server
b.     
see the log file:
 
..\docs&Settings\AllUsers\ApplData\1E\NightWatchmanManagementCenter\NWM.ServiceHost.exe
c.        Look for entries similar to the following:
[Info] 8/24/2009 1:05:36 PM: Job <WakeUpComputers>b__9 is being queued...
[Info] 8/24/2009 1:05:36 PM: Job <WakeUpComputers>b__9 is starting...
[Verbose] 8/24/2009 1:05:36 PM: Executing procedure '[dbo].[spNWM_FetchComputerDetailsByArray]'
[Info] 8/24/2009 1:05:36 PM: Sending WakeUps to 1 machines
[Info] 8/24/2009 1:05:36 PM: Attempting to connect to WMI namespace '\\NWMMGMTCTRSERVER.lab.RI.1e.local\root\N1E\WakeUp'
[Info] 8/24/2009 1:05:36 PM: Connected to WMI namespace '\\NWMMGMTCTRSERVER\root\N1E\WakeUp'
[Verbose] 8/24/2009 1:05:36 PM: Creating WMI Machine with name: TARGET, domain: LAB, MacAddress: 00:19:BB:54:68:C0, IPAddress: 192.168.1.106, Subnet: 192.168.1.0, SubnetMask: 255.255.255.0
[Verbose] 8/24/2009 1:05:43 PM: PolicyMembershipRefreshTask.Execute()
d.     
Signs of trouble (this instance was due to the Console Service not having permission to WMI on the NWM MgmtCtr server):
[Info] 8/21/2009 3:13:40 PM: Sending WakeUps to 1 machines
[Info] 8/21/2009 3:13:40 PM: Attempting to connect to WMI namespace '\\NWMMGMTCTRSERVER\root\N1E\WakeUp'
[Info] 8/21/2009 3:13:40 PM: Connected to WMI namespace '\\NWMMGMTCTRSERVER \root\N1E\WakeUp'
[Verbose] 8/21/2009 3:13:40 PM: Creating WMI Machine with name: TARGET, domain: UI,
acAddress: 00:22:19:1F:02:5B, IPAddress: 192.17.51.53, Subnet: 192.17.48.0, SubnetMask: 55.255.252.0
[Error] 8/21/2009 3:13:40 PM: An error occurred while executing on-demand wakeup:
Type:
System.Management.ManagementException

Message:
Access denied

3.       The “WakeUp Server” service receives the wakeup request from the console, gathers up the “location” info of the target machine from the database (IP and MAC addresses), determines which WU Agent is “registered” as the  primary (or alternate) agent responsible for that subnet, creates a “Wakeup List” of who to wake, and hands that list off to its partner the WakeUp (client) agent (also installed on the WakeUp server, and identical to all other client agents)

a.       See the log file
..\docs&Settings\AllUsers\ApplData\1E\WakeUpSvr\WakeUpSvr.log
b.     
Look for entries similar to the following (in this log, the registered “primary agent” happens to be the WU agent on the server, or “Local Agent”. This log example was taken where the target is on the same subnet as the server. In a true remote subnet scenario, “Local Agent” references are replaced by a client name on the remote subnet. The process is identical, however):
8/24/2009 09:13:53: Connected to WakeUp agent on "Local Agent" - 127.0.0.1 ßmy pri agent installed on my WU Server, same subnet as all clients (#4 in callouts above)

8/24/2009 13:05:37: accTCP[0] 127.0.0.1
8/24/2009 13:05:37: recvTCP[0] len = 804 WAKELIST
ßWho am I supposed to wake up?
                    TCP WAKEDONE sending 32
                    TCP WAKEDONE sending 32
8/24/2009 13:05:38: 1 entry added to the Wakeup list
ß
this is “who” - my “target” client name added to this “list”
8/24/2009 13:05:38: hWakeUpsEvent
8/24/2009 13:05:38: Process WakeupList len=1
8/24/2009 13:05:38: A2Q Wake Up 192.168.1.106 255.255.255.0 [00:19:BB:54:68:C0]
ß from Inventory; this is the target client’s IP/MAC addresses

Agent=NWMMGMTCTRSERVER
ß ...and this is the Primary Agent tasked with getting the Wakeup List
8/24/2009 13:05:38: Pre-Resolved PRIMARY address NWMMGMTCTRSERVER 192.168.1.11
                    Setting PRIMARY agent to resolved ip address 192.168.1.11
8/24/2009 13:05:38: TCP Connecting to "NWMMGMTCTRSERVER" - 192.168.1.11:1776 (1)
8/24/2009 13:05:38: Writing {B6FC94C2-03BA-44D2-A749-0920BDB1D237}
TimeStamp=2009-08-24 17:05:38

c.       8/24/2009 13:05:38: TCP Sending.. to NWMMGMTCTRSERVER - 1 Wakeups len=76 typ=0003
                    TCP WAKE_TCP sending 76
                    Agent state change detected, Sending wakeups...
                    Rippling up subnet state for 192.168.1.0
8/24/2009 13:05:38: TCPreply len = 36 ACKDONEV
8/24/2009 13:05:38: 1 wakeups sent to agent NWMMGMTCTRSERVER for subnet 192.168.1.0
ß
sending the request to my local server’s companion WU Agent
8/24/2009 13:05:38: 192.168.1.0 - Released 1 items
8/24/2009 13:05:38: TCP Closing "NWMMGMTCTRSERVER"
ß Wakeup list sent; This portion is complete

... and now we wait for feedback on the WU request from the Primary Agent in Step 5...
d.      The Primary Agent in Step 5 returns the following status details back to the server

8/24/2009 13:05:43: recvTCP[0] len = 76 WUREZULT ß here we get the results back from my local agent that DID the actual WU
8/24/2009 13:05:43: WUREZULT subnet-192.168.1.0 hJob=E9BBD3CC Awake=1 Wokeup=0 WillPing=0 Bad=0 Failed=0
                    Awake    2:LAB\TARGET 192.168.1.106 00:19:BB:54:68:C0
ß and we are done; this particular client was already awake

8/24/2009 13:05:43: Writing {B6FC94C0-03BA-44D2-A749-0920BDB1D237}
ß updating WU statistics at this point
TimeStamp=2009-08-24 17:05:43
JobId=LABE9BBD3CC
AgentSubnet=192.168.1.0
AgentName=NWMMGMTCTRSERVER
Awake=1
WokeUp=0
WillPing=0
Failed=0

4.       The WakeUp Server’s Agent receives the WU List and proceeds to hand it off to the Primary Agent on the remote subnet

a.       In these logs the WU Server’s “agent” and the “Primary” agent for this subnet are one and the same
b.      proceed to the next step

5.       The Primary Agent receives the WakeupList from the WU Server Agent...

a.        see the log:
..\docs&Settings\AllUsers\ApplData\1E\WakeUpAgt\WakeUpAgt.log
b.        Look for entries similar to the following:

8/24/2009 15:53:08: recvTCP[0] len = 76 WAKE_TCP
                   TCP ACKDONEV sending 36
8/24/2009 15:53:08: Processing JobId=E9C9FC67 from 192.168.1.11 contains 1 wakeups with action 0 ß
what to do?
8/24/2009 15:53:08: Processing AgentJob id=E9C9FC67:0 cnt=1 03
8/24/2009 15:53:08: send_magicpkt (03) for 192.168.1.106 broadcast on 192.168.1.255 [00:19:BB:54:68:C0] ß Do it here!

6.        ... and sends the Magic Packet to the TARGET client on its subnet, gets the results of that action and sends same back up to the server

a.        still reading the log in the Primary Agent from Step 5:

8/24/2009 15:53:08: recvUDPmsg len = 118 MAGICPKT ß I just did it
8/24/2009 15:53:08: recvUDPmsg len = 24 WASAWAKE ß response from the client
 
                  
WASawake 00:19:BB:54:68:C0 id=E9C9FC67
8/24/2009 15:53:08: Closing[0] 0
8/24/2009 15:53:13: WakeupFeedback jobId=E9C9FC67 cnt=1
                    TCP WUREZULT sending 76 to 192.168.1.11:1777 ß
sending results back to the server
8/24/2009 15:53:38: Sent HELLOWAK to 192.168.1.255 E9C9FC67:0
8/24/2009 15:53:38: senddatagram (3912) len=28 HELLOWAK
8/24/2009 15:53:38: recvUDPmsg len = 28 HELLOWAK ß and say “good morning” to be polite J

7.       TARGET client wakes up, or simply states “I was already awake”

a.        see the log on TARGET:
..\docs&Settings\AllUsers\ApplData\1E\WakeUpAgt\WakeUpAgt.log
b.       
In our example the client was already running so you should see in its log:

8/24/2009 15:53:09: Sent WASAWAKE to 192.168.1.11, E9C9FC67 ßnote "reply back" referencing same “AgentJob id in 6.b, above

Summary

All of this may seem overly complicated, but the truth is that you will rarely ever need this level of detail. The system generally “just works” when installed properly and observing the various caveats, port requirements, firewall exceptions and prerequisites that are clearly documented in the respective Installation Guides. When things don’t go as expected, simply remember what the flow is from top to bottom, enable DEBUG=255 to expose the detail necessary, and follow the logs until you find the point where things break down. The most common causes of failure are:

·         Client is not WOL enabled in both its BIOS and on the NIC properties (1E Support can be a big help here, using automated tools to remotely enable these difficult elements!)
·         No primary or alternate agent can be found and registered on the remote subneto   management of that subnet reverts back to the WU Server which more than likely cannot send magic Packets across the WAN to the remote client

o   Most common reasons for this registration failure are documented in my related blog post 1E Blogs | 1EWakeup Console: Subnet and Agent Registration Process

I hope you find this info useful and helps you get out of a jam as it did for my customer this past week!

Read the complete post at http://www.1e.com/techblog/post/2009/08/25/Troubleshooting-Failed-Wakeups.aspx

Published Aug 25 2009, 04:19 PM by 1E Blogs
Filed under:
Copyright - www.myITforum.com, Inc. - 2010 All Rights reserved.
Powered by Community Server (Commercial Edition), by Telligent Systems