It started at a K-12 client of mine. The above error, at a random point in the OS WIM download during OSD. PXE boot, usb boot, didn’t matter. Sometimes you could try it again and it would work like a champ.
Further investigation into the logs showed that the task sequence would fail to download the file, retry, and fail again. Everything we looked at pointed to the network.
Then it happened at another client. Same issue, different network infrastructure. Still looked like a network issue. However, in troubleshooting, I noticed that if I ran a quick “wpeutil disablefirewall” on the client when the osd process started, it seemed to image successfully. A problem with the winpe 3 firewall? I couldn’t see how that would be the case. I considered modifying the task sequence to run that command first, disabling the firewall before anything else, but ultimately ran out of time at that client.
Finally it happened at an Enterprise client that had a great CCIE on staff. He came to me with issues of port flapping shortly after the imaging process started to fail. Some troubleshooting at the network level revealed that existing clients were in some cases trying to communicate with the MAC address of machines that were imaging. This was ultimately causing the switch to shut down the ‘rogue’ port for a period of time and bring it back online.
Some perusing around the Cisco forums led us to this gem:
If you don’t know about “Wake Up Proxy” and how it works, I suggest you go give it a read. It’s got great info.
In short, what was happening was that when a machine “known” to sccm was reimaged, like we were doing dozens of times a day during the testing phase, one of the “subnet guardian” clients would decide that the machine being imaged must be asleep, since it could not ping it (WINPE Firewall). So the guardian machine would start to assume it’s MAC address. The important bits from the link:
“The redirection is achieved by the manager computer broadcasting an Ethernet frame that uses the sleeping computer’s MAC address as the source address. This makes the network switch behave as if the sleeping computer has moved to the same port that the manager computer is on. The manager computer also sends ARP packets for the sleeping computers to keep the entry fresh in the ARP cache. The manager computer will also respond to ARP requests on behalf of the sleeping computer and reply with the MAC address of the sleeping computer.”
Because the switch was actually receiving the same MAC on two different ports, it shut down the apparent ‘rogue’, causing the WIM download to error out (since it was the only thing big enough to be sensitive to the temporary port shutdowns, I assume).
Once we turned off “Wake Up Proxy” in the client settings and let that percolate to all of the clients, we were able to image again without issue. I assume that turning off the WINPE firewall at the beginning of the task sequence would also take care of it, but have not had a chance to test that.