It all started with a call about poor performance when printing from a local server. Then it escalated to poor performance in other apps that run on that server. A quick look at the server confirmed constant 90-100% utilization. The culprit? LSASS.EXE.
Being that it occurred in the middle of the day, we first tried to get Server Performance Analyzer running on the box. This proved fruitless. The server was so bogged down that doing anything on it was en exercise in extreme patience. So running anything on the box was really out of the question.
Finally the day ended and I got into the server via Integrated Lights Out board (thank you, HP. How many times has this saved YOUR bacon??) The minute I disabled the NIC on the server utilization dropped to normal, and LSASS took a break. This indicated clearly an external cause – something on the local network was hammering the server with some form of AD query.
Since I couldn’t packet capture on the box, this had to wait until the next day, when our managed service provider for networks could port span the switch. Using good old Netmon and the Top Talkers Expert, I was able to pin the culprits down to several machines.
Here’s where I made the first mistake. We name machines for the user, so every time a user gets a new machine we image it up and put an A or other suffix at the end. Well, the machines showing up in the packet trace mostly had an A suffix. But when I looked in SMS, there were no machines of the same name there. There were machines with that name without the suffix. So I chalked it up to an error in naming on the part of the tech, and started investigating the machines I could see in SMS.
Lesson 1: Trust the trace. I should have confirmed MAC addresses at least.
Around this time we started to see the issue on other servers in other parts of the country. Using traces from both offices we finally figured out that the issue seemed to be isolated to the new HP6930p laptops that we were deploying. Unplug all of the new ones in an office from the network, and the LSASS issue on the server goes away.
Simple, or so I thought. Load Netmon on one of them and let it tell us what’s going on. After some looking, it turns out that the process is hidden under a system process. So although Netmon can see the traffic, it can’t point to the app that is generating it.
Now we’re at the point of disabling and removing apps from the laptop. One of the first we try is the HP Protect Tools, because this is the first time we’ve deployed laptops with that installed. That turns out to be the culprit. Remove the Protect Tools, and the LSASS issue goes away.
Upgrading the Protect Tools from version 4.00.3.001 to the latest version also fixed the issue.