One of my big pet peeves around SCCM OSD used to be forgetting to “Clear Last PXE” or delete the existing object so I could re-image a machine while testing. You had to either clear it or delete it and wait for the machine to reboot before you could restart the process and walk away.
Well here’s a trick you may not know. If you have the Command Prompt enabled in your boot.wim (and you DO have that enabled, right?), when the task sequence stops you can press F8, and then go fix whatever it is you forgot to do. Then close your task sequence notification box and type “TSBOOTSHELL” in the command prompt window. Viola! Task sequence starts over.
Note you can’t do this if the task sequence has actually failed, and be certain to leave your command prompt window open until the Task Sequence is ready to reboot. If you close it before then, your machine will reboot without prompt or notice!
I got a call from a good friend of mine that a mutual acquaintance of ours was having computer trouble. He’s a retired guy in his mid-seventies that has a ton of vacation pics and video on his laptop (a two-year old HP).
Normally I pass on this kind of work. There are a couple of good local places to take these problems, and while I love playing in the Enterprise space I loathe cleaning viruses and malware from personal machines that have been neglected. But in the case I relented, mostly because the gentleman is facing medical issues and a raft of tests, and I wanted to be certain his data was safe.
His problem was that the machine would not log in regularly and it took over an hour to come up in safe mode. When I got there the machine was logged in in Safe Mode, so I quickly created a new admin account and shut it down to take it home and work on it. He also gave me his external USB drive.
Once home I booted to the latest version of the Ultimate Boot CD from USB. If you’re unfamiliar with this utility, it is exactly what it claims to be. It contains nearly any utility one could need to resurrect a computer in need.
Anyway I first scanned the drive with F-Prot and then ClamWin and was surprised to see it come back clean in both cases. I checked the SMART status of the drive and everything looked good. I rebooted normally, logged in with the new Admin account and everything seemed fine. I did a scan with SpyBot and an online scan with Trend. Both of these came back clean as well. That was odd, I was certain I’d find something on there. His Symantec A/V was up to date as well. Normally you’ll find whatever A/V the computer came with is woefully out of date and begging to be licensed. I tried to run a file-level copy of his data out to the USB drive, but it failed at some point, locking the machine and forcing me to reboot.
I rebooted back into the UBCD and ran a file-level copy of his data off to the drive and that went fine. At this point I was thinking that the best option would be to create him a new account and copy everything over, then delete the old one. But for good measure I decided to do a partition copy first. About a quarter of the way in, the partition copy popped up with a block read error. Huh. So I started it again. Block Read error in the same spot.
I got back into safe mode and scheduled a Scan Disk with ‘Bad Sector Scan’ and rebooted. I watched it long enough to catch the first error, which turned out to be a bad chunk in NTUSER.DAT. Imagine that.
I let the scan run overnight and everything was fine the next day – login times were normal, the user could log into his original account.
I returned the machine to the user with the usual admonishments about regular backups, and that this could be a harbringer of more bad sectors to come if the drive was failing, or could have been a one-time event.
Either way, it left me wondering why Windows didn’t force a Scandisk (unless it did early on and he missed it) or why SMART didn’t detect an issue. The lesson here is one I apparently have to keep having hammered home, and that’s to let the troubleshooting guide you. I let my preconceived notion (older user with computer = virus) bias my troubleshooting process, costing me hours of time across multiple virus scans. If I’d just got home and started backing up immediately I might have caught it sooner.
Short Version : MultiPoint server and Network Level Authentication in the Remote Client of SCCM 2012 do not play well together when using zero-client USB connected devices.
When working with Multipoint Server, remote control can be an issue. In the one site that I have many MP servers in, we opted to use Multipoint manager. Not integrated into any other console, but it works. Another option would be to assign each client it’s own IP and remote them that way.
At this particular client we deployed 75 MP 2011 servers across 12 buildings (a school district). All was well until we upgraded to SCCM 2012. Suddenly we started getting odd screen issues – upon boot the screen would come up but the screen saver would not turn off. Moving the mouse just caused the screen saver graphic to jitter around the screen. Booting the server in safe mode or connecting RDP were the only ways to login. We could find no errors in the main log files are events related to the problem.
After a day we decided to move our one MPS 2012 box over to SCCM 2012. Immediately on refreshing the client after the upgrade, each station got the error below.
A quick look at the Remote Control settings in the SCCM Client showed that, sure enough, Remote Control settings were set to Network Level Authentication. Once this was turned off in all of the client policies and the clients refreshed, operation returned to normal.
It started at a K-12 client of mine. The above error, at a random point in the OS WIM download during OSD. PXE boot, usb boot, didn’t matter. Sometimes you could try it again and it would work like a champ.
Further investigation into the logs showed that the task sequence would fail to download the file, retry, and fail again. Everything we looked at pointed to the network.
Then it happened at another client. Same issue, different network infrastructure. Still looked like a network issue. However, in troubleshooting, I noticed that if I ran a quick “wpeutil disablefirewall” on the client when the osd process started, it seemed to image successfully. A problem with the winpe 3 firewall? I couldn’t see how that would be the case. I considered modifying the task sequence to run that command first, disabling the firewall before anything else, but ultimately ran out of time at that client.
Finally it happened at an Enterprise client that had a great CCIE on staff. He came to me with issues of port flapping shortly after the imaging process started to fail. Some troubleshooting at the network level revealed that existing clients were in some cases trying to communicate with the MAC address of machines that were imaging. This was ultimately causing the switch to shut down the ‘rogue’ port for a period of time and bring it back online.
Some perusing around the Cisco forums led us to this gem:
If you don’t know about “Wake Up Proxy” and how it works, I suggest you go give it a read. It’s got great info.
In short, what was happening was that when a machine “known” to sccm was reimaged, like we were doing dozens of times a day during the testing phase, one of the “subnet guardian” clients would decide that the machine being imaged must be asleep, since it could not ping it (WINPE Firewall). So the guardian machine would start to assume it’s MAC address. The important bits from the link:
“The redirection is achieved by the manager computer broadcasting an Ethernet frame that uses the sleeping computer’s MAC address as the source address. This makes the network switch behave as if the sleeping computer has moved to the same port that the manager computer is on. The manager computer also sends ARP packets for the sleeping computers to keep the entry fresh in the ARP cache. The manager computer will also respond to ARP requests on behalf of the sleeping computer and reply with the MAC address of the sleeping computer.”
Because the switch was actually receiving the same MAC on two different ports, it shut down the apparent ‘rogue’, causing the WIM download to error out (since it was the only thing big enough to be sensitive to the temporary port shutdowns, I assume).
Once we turned off “Wake Up Proxy” in the client settings and let that percolate to all of the clients, we were able to image again without issue. I assume that turning off the WINPE firewall at the beginning of the task sequence would also take care of it, but have not had a chance to test that.
ADDENDUM: Adding a "Run Command Line" of 'wpeutil disablefirewall' does indeed work to make the machine pingable again during intial OSD. I put mine in right after the partition disk section. I can't confirm, at the site I am at as I write this, if this fixes the issue above though, since they fixed it at the switch and we currently have "Wake Up Proxy" turned off. I'll update once we get it turned back on.
Deployed SP1 for 2012 and suddenly things aren’t as they seem?
Distributions look OK in the console, but not working? Imaging deployments failing during the download of the image? Odd errors in the distmgr.log like:
Failed to set share security on share \\sccm2012.domain.com\SMSSIG$. Error = 5
Failed to set access security on share SMSSIG$ on server sccm2012.domain.com
DPConnection::Disconnect: Revert to self
Cannot find or create the signature share.
We saw that and more and puzzled over it for a few days, until I finally ran across this thread :
In our case the Site System account was already set to the local computer. I did two things, so either one could have fixed it.
Set the Site System account as a local admin (we only had a primary site server)
Changed the Site System account over to a network account, then changed it back to the computer account.
Once I did that and rebooted the server for good measure., imaging and everything else took off again. Luckily we’re in pre-production on SCCM 2012, so no production clients were affected.
Addendum : Once this is done, it's probably a good idea to delete and re-create your boot images. Remove them from disitribution, delete them, then re-create them and re-distribute them. Look for errors in the smspxe.log, it should be clean. We were seeing errors that appeared to be related to distributing a .ttf. Once we deleted and re-distributed, the pxe logs were clear.
If you have to configure this, do yourself a favor and stop by here first. Save yourself some searching and heading in trying to figure out permissions when trying to connect the management console, particularly when both machines are in workgroups (you lab, for instance).
If you're going to work for a startup, or even a relatively new company, do not fear to ask about things like funding, capital, revenue, and profitability. It could save you some ginourmous headaches.
Beyond, of course, all of the amazing content I’ll miss and friends I won’t see:
All of the vendors, pinging my to stop by and see them and pick up their swag while I’m at MMS. Just taunting me.
Passed. Again, I’ll call out the guys below who do a great job on the course.
If my iPhone ever shows up. Note it’s “Free for a limited time”!
By know we all know that ITIL is a necessary evil. Or perhaps not evil to you, but something we all need to be aware of to varying degrees, depending on your organization and place in it.
I’ve been through a couple of rounds of ITIL training, both for V2 and the update. We just finished the V3 foundations course, and it was hands down the best facilitator I’ve seen for ITIL training. The fine folks at svcmgtdynamix.com do an excellent job of wrapping your head around the concepts and acronyms, while linking it back to your every day reality. I’ve had instructors in this just “go through the slides” and these guys are far, far better. I’d recommend them to anyone seeking any of the ITIL training services that they offer.
I’m leaving my position of 11+ years to pursue opportunities a little closer to home. Hopefully I won’t leave the Systems Management community too far behind – I’m trying to sell them on SCOM – but my duties are certainly going to change and broaden as I move from a firm with dozens of offices in 20+ states to a firm specializing in hosted services.
Regardless I will not leave the MyITForum community behind, and will post up interesting technical issues here as they arise.
There are many ways (and reasons) to create dynamic groups in SCOM. Just a quick post with links that outline some of the more common ways to do it, based on various criteria.
Great Webcast on using a Formula!
More Posts Next page »