SCCM Task Sequence blew up Australia’s CommBank

In “Lessons in what NOT to do with SCCM”, we learned that a misguided patch distributed through SCCM may have taken down an entire Australia banking system.  Then, in “Update to the SCCM package heard round the world” we heard about the numbers of desktops and servers effected, along with how HP (to whom CommBank outsources infrastructure services) is scrambling to make amends, sending HP CEO, Meg Whitman, into the fray.

Well, over the weekend, I was able to source even more scoop on this.  No one has been able to get a clear picture of the true issue until now – partly because there are plenty of fingers pointing, and partly because of pride and cover-up.

Of course…as we all know working with SCCM over the years, SCCM picks-up most of the blame when things go wrong.  However, we also know that SCCM simply does exactly what you tell it to do.  Still, it’s an easy target, particularly for those upper management types who really have no clue about how technology works.  They think they do, but when you actually take them to task, they prove their lack of knowledge in 30 seconds or less.  When that happens, it’s best to just snicker under your breath and walk away.  Confronting them with it (particularly in front of others) just gets you a direct ticket to the list of employees to show the door when times get tough.

It has been reported previously that a “patch” was the culprit of the issue at CommBank.  This was a rumor, and if it was indeed a patch, there would be a lot of others besides HP scrambling.  If it was a “patch” you’d see Microsoft onsite right beside the HP brass.  There has been question after question in the communities asking “what patch?”, “which one?”, because if a patch caused the issue it could cause problems in other companies.  Microsoft patches are developed uniquely in that they will not install on a system where it is not compatible or required.  Plus, if it were a Microsoft patch that caused the problem, Microsoft would have been helping the company rollback the errant patch.  So, folks, you can rest easy – this was not a patch.

No…this was the result of Task Sequence distributed to a custom SCCM Collection.  The Collection had been created/modified by an HP Engineer (adding a wildcard) and the engineer had inadvertently altered the Collection so that it was very similar in form and function to the “All Systems” Collection.  The Task Sequence contained automation to – here it comes – format the disks.  Yes, the disks of some 9,000 PCs and 490 servers (including domain controllers) were formatted and wiped clean.

Right now, HP is working night and day to rectify the situation.   And, of course, they (HP) are attempting to blame SCCM, saying it’s SCCM’s fault for not prompting an alert before the wiping out the disks. Anything to shift blame, I guess.  What did I say earlier?  30 seconds or less?

OK…so this is the meat of the story.  There’s more, particularly from a “Cloud” and datacenter perspective that I’ll write-up and share soon – as soon as I have all my notes together.

email

Written by , Posted .
  • http://myITforum.com/myitforumwp/community/members/wright1968/ Dave Wright

    This is definately a call to action to make sure our processes are capable of quickly responding to an averting a crisis like this, and that processes are in place to prevent it from happening in the first place. Thanks for posting.

  • http://myITforum.com/myitforumwp/community/members/scriptingit/ Bruce

    Interesting post. We have seen similar a few times over the years. If you delete the query details manually and save it I’ve seen it create “Select *” which is all systems and possible what happened here. It also highlights the good practice of not managing SCCM servers with SCCM (no client on those servers) so at least you cannot kill the most likely server to help deal with the issue. SCCM must be respected and suites a management interface that automates things like collection/query logic.

  • http://myITforum.com/myitforumwp/community/members/maximillianx/ Rob Dunn

    I’m sure plenty of testing was in place, but for god’s sake, you should schedule the deployment so you have enough time to stop the advertisement, and you should update your collection (maybe with 9,000 computers, that would take a long time?) to see what you are about to affect when you unleash SCCM upon them.

    SCCM doesn’t kill computers, PEOPLE do.