April 2007 - Posts

Summary: client deployments (new installs or upgrades) are a critical part of the life of any SMS administrator. Whether you're rolling out your first infrastructure or upgrading a well established one, you can't really take advantage of SMS until you've done most of the client deployments. But how do you verify the client deployments are going well?

Traditionally you did the monitoring according to the method you used. If you did upgrades via software distribution, then you watched software distribution status messages. If you used push installation, then you watched the push logs and file counts. If you did it via logon script or computer startup script, then you rolled your own solution. In all cases you watched the system resource activity (DDRs, as seen in v_R_System). So you had options, but they only told you part of the story, and you couldn't use one solution for all methods.

Now with SCCM 2007 we have a great solution in the form of the Fallback Status Point, whose data rolls up very neatly to the v_ClientDeploymentState view at your central site. What makes it so great? Well, it's universal - all client deployment methods can use it. More importantly, it's more granular - you can see who tried what and who succeeded, at both the client installation and site assigment stages. If reboots are needed, you can track that too. Failure messages are very detailed.

From SMS 2003 you may be used to views like v_ClientAdvertisementStatus or v_GS_PatchStatusEx. They were brilliant in that they gave you the latest status for advertisments or patch deployments. If you only had a few thousand clients or less you could always do that via the original SMS views, but if you had more clients then those original views were prohibitively slow. v_ClientAdvertisementStatus and v_PatchStatusEx accumulated the results as they rolled in, and so when you ran your reports they told you exactly what you needed right away, even if you had a couple hundred thousand clients.

v_ClientDeploymentState now does the same thing as v_ClientAdvertisementStatus and v_PatchStatusEx. Complete, fast, fully detailed client deployment status: brilliant. TechSexy!

Ok, enough background - how are you going to use it? Well, there are some built-in SCCM reports that may do the job for you. But chances are you're going to want to dive deeper, sooner or later. So:

To start with, you'll want to watch the key deployment phases. The following queries do that, including percentages. Each phase has the potential to fail, so they won't likely all give "100.0%" as the result, and likely each phase will be a little less successful than the previous one:

declare @total as integer
select @total=count(*) from v_ClientDeploymentState where LastMessageStateID is not null and SMSID is not null
select @total 'clients'

select count(*) 'deployment started', count(*)*100.0/@total '%'  from v_ClientDeploymentState where DeploymentBeginTime is not null
  and LastMessageStateID is not null and SMSID is not null
select count(*) 'deployment done', count(*)*100.0/@total '%'  from v_ClientDeploymentState where DeploymentEndTime is not null
  and LastMessageStateID is not null and SMSID is not null
select count(*) 'assignment started', count(*)*100.0/@total '%'  from v_ClientDeploymentState where AssignmentBeginTime is not null
  and LastMessageStateID is not null and SMSID is not null
select count(*) 'assignment done', count(*)*100.0/@total '%' from v_ClientDeploymentState where AssignmentEndTime is not null
  and LastMessageStateID is not null and SMSID is not null

Extending those queries a bit will give you the count of computers that fail in each stage, for example in the deployment stage:

select count(*) 'deployment failed', count(*)*100/@total '%'  from v_ClientDeploymentState where DeploymentBeginTime is not null and DeploymentEndTime is null

Better yet, you can get the details on why they failed: 

select StateDescription, LastMessageStateID, LastMessageParam, count(*) 'clients' from v_ClientDeploymentState where DeploymentBeginTime is not null and DeploymentEndTime is null group by StateDescription, LastMessageStateID, LastMessageParam order by 4 desc

Or get the computer names so you can investigate them to confirm the exact details:

select NetBiosName from v_ClientDeploymentState where DeploymentBeginTime is not null and DeploymentEndTime is null

If you poke around at the view you'll see that there are success states as well as failure states, so you might think you could do your success reports based on the relevant sucess state messages. However, there are some exception state message IDs, like 401 and 402, that schew those results. So you're better off to look at the relevant stage columns (deployment start, for example), rather than the state messages IDs, when you're looking at anything other than failures.

Another interesting avenue of investigation is the length of time that the successful deployments take. Most are fast, but the slow ones are worth investigating:

select DeploymentBeginTime 'beginTime', DeploymentEndTime 'endTime', datediff(second,DeploymentBeginTime, DeploymentEndTime) 'duration' into #temp1 from v_ClientDeploymentState

declare @withduration as integer
select @withduration=count(*) from #temp1 where duration is not null
select min(duration) 'fastest (seconds)' from #temp1
select max(duration)/3600 'slowest(hours)' from #temp1
select count(*) 'more than an hour', count(*)*100/@withduration '%' from #temp1 where duration>3600
select count(*) 'more than 3 hours', count(*)*100/@withduration '%' from #temp1 where duration>3600*3
select count(*) 'more than 12 hours', count(*)*100/@withduration '%' from #temp1 where duration>3600*12

How many clients need reboots to complete the client installation (and why?) (note that the 401 and 402 messages track the reboot needed/happened status:

select StateDescription, LastMessageParam, count(*) from v_ClientDeploymentState where RebootNeeded='*' group by StateDescription, LastMessageParam order by 3 desc

Which sites are the clients being assigned to?:

select AssignedSiteCode, count(*) from v_ClientDeploymentState where AssignedSiteCode is not null and AssignedSiteCode<>'' group by AssignedSiteCode order by 2 desc

I'm sure you're seeing the possibilities this view makes possible.

p.s.: "TechSexy" probably means different things to different people, but to me TechSexy stuff is SCCM changes that make my life better, in any substantial sense. I can show them to anyone that knows SMS and they'll know that they're 'cool'. Not necessarily cool in the sense of Zune or Xbox 360, but cool in that they make computer management more 'vital and alive'. Responsive to my real needs. My life is better in a big way.

SCCM has plenty of new features that the marketing guys will tell you about that make SCCM 2007 more valuable to you than SMS 2003. I'm excited about those features, will use them extensively, and they help the Microsoft IT team to deliver value to Microsoft well beyond the costs of running our infrastructure and the team. But "features" are one thing and "TechSexy" is another. Oh, and they can't be in SMS 2003, because then they wouldn't be what makes SCCM uniquely TechSexy. So when I share TechSexy ideas, they are fun SCCM ideas you might not find anywhere else, baby!

(Then again, defining cool is not cool. So I won't do it again)

 

It's finally available to everyone, as follows. That took longer than you and I might have hoped, but I did warn you in the previous posts. There's usually some delay, and this felt a little longer than usual, but maybe it was just a 'watched pot' problem. In any case, I do include those disclaimers for a reason

Surprise

Upgrade:
http://www.microsoft.com/downloads/details.aspx?familyid=3992a556-32e3-49ab-b734-6341d208c66d&displaylang=en&tm

Evaluation:
http://www.microsoft.com/downloads/details.aspx?familyid=7f07bfe0-5874-4444-8eb6-cbdb80f6c921&displaylang=en&tm

Today's disclaimers: the above URLs might not work for you, for a number of reasons. Most commonly due to cache update issues, but also potentially due to overloaded connections. Or maybe they'll decide to move them somewhere. Be sure to check out www.microsoft.com/sms, or links from real-time postings by other community members at myITforum.com or the newsgroups.

 

Posted by pthomsen | with no comments
Filed under:

Rumor has it that the pop-up turkey thermometer of SP3 has popped!

I haven't heard anything about when it will be served to your plate (i.e. via whatever channel you like, such as microsoft.com/SMS). It all depends on how fast the electric carving knife is working, and whether it's already busy with a ham or something. (I'm really stretching this metaphor, and I hope it's understandable outside of North America...)

Posted by pthomsen | with no comments
Filed under:

Yes, SMS 2003 SP3 is due to be posted this Friday April 27th (yes, 2007). The most reliable of sources has confirmed it.

I don't know any other details. For example, will Premier customers get the first hit at it? Or what time of day. But at least some peole will get their hands on it this Friday.

Of course there's the usual disclaimers. Maybe the product team will release it on Friday but the microsoft.com guys won't actually get it publicly available until Saturday or so. Or maybe our ever diligent professional testers will find some last minute BIG BUG. That's VERY unlikely given how much testing has already been done, but anything is possible.

Posted by pthomsen | 6 comment(s)
Filed under:

Summary: for those packages that are applicable to lots of clients, keeping them on most of your distribution points (DPs) is common sense. But for old or obscure packages it's easy to forget about them and they may end up missing on a lot of DPs. That can cause unexpected pain. Finding packages in this state can be tricky.

When you're deploying Office to your whole company, you no doubt keep a close eye on the package in the first couple of months to ensure it's on all DPs and stays there. But what about your hardware inventory extension package, or some quick fix that was important long ago? With the ebb and flow of things the package may end up on relatively few DPs, and yet it could still be applicable (at least at the collection level) to lots of clients.

So what happens in this case? Well, obviously, the advertisement can't run on the targetted clients because the package is missing. If it's an old or obscure package nobody may care and so that in itself may not be painful. In fact everyone might say "who cares?". But what is happening is that the clients will be sending location requests to the management points (MPs) asking "where is this package?". The MP will not respond because it doesn't know. An hour later each client will ask again. This will continue forever, multiplied by the number of clients and the by the number of packages missing from DPs. If you've got 10,000 clients at the site, and 12 packages in that state, that's 120,000 location requests per hour. 2,000 per minute. 33 per second. Maybe not a huge workload, but with all the good activity going on (inventory uploads, status uploads, policy downloads, etc.), can you really afford to have all this wasted activity on your servers and network? And if the MPs in question are proxy MPs (or if they're standalone MPs without their own database replicas), then the primary site is going to get all the SQL Server hits to answer those location requests (location requests are not cached at proxy MPs).

Of course maybe you don't have 10,000 clients at any of your sites. Or you have multiple MPs at the site. But you might also have a lot of packages in this state, over time. Or you might have a lot of good activity hitting the MPs (other advertisements and data uploads). So this is an issue that can be important to a lot of us.

The best practices are to review all your advertisements regularly and make sure you expire or delete those that aren't useful anymore. Make sure the packages are on all the DPs where clients will need them. Make sure the collections only target clients that the package would still be useful on. As professioanl administrators we understand that everything has a lifecycle, and the cleanup part of the lifecycle is just as important as the creation or maintenance parts of the lifecycle. Right? Well, I hope that's true, but most people I talk to admit that the cleanup part is one of those things that is easily forgotten. Much like documentation or security, I'm afraid.

And how do packages end up missing on relevant DPs? There's a bunch of scenarios for that. DP changes are the most likely case - you rebuild a DP, or replace it, for whatever reasons. Maybe you move it from one site to another. Or you move the site from one hierarchy to another. Maybe you expand the scope of the collection but forget to add some of the DPs. Maybe relevant clients come to a site that you didn't expect to have relevant clients. The list goes on.

Or how does an advertisement that has run on a high percentage of your clients later become applicable to a lot of clients? Well, the clients could be rebuilt or repaired, so that the same box now has a new SMS client. Or maybe the advertisement is a recurring advertisement. Or maybe you add a bunch of machines that meet the collection's criteria but you forgot about that collection. Maybe they got moved from another hierarchy. There are plenty of ways that a lot of clients could 'sneak up you'.

So it's easy to get into this situation, and if you've got a smallish number of objects and a reasonable amount of spare time, you can manually review them in the console on a regular basis. Cool. But what if neither of those conditions is true? I offer the following SQL script:

 -- advertisement to a large fraction of clients but on a small fraction of DPs
declare @client_count as integer
select @client_count=count(distinct name0) from v_R_System where client0=1 and obsolete0=0
select @client_count 'total clients'
declare @possible_DPs as integer
select @possible_DPs = count(distinct ServerNALPath) from v_DistributionPoint -- number of possible DPs
select @possible_DPs 'possible DPs'

select CAS.AdvertisementID, Ads.AdvertisementName, Ads.PackageID, count(*) 'clients',
 (select count(*) from v_DistributionPoint where PackageID=Ads.PackageID) 'DPs'
from v_ClientAdvertisementStatus CAS join v_Advertisement Ads on CAS.AdvertisementID=Ads.AdvertisementID
where AssignedScheduleEnabled<>0 and ExpirationTimeEnabled=0 and
 PackageID in
 (select PackageID from v_DistributionPoint group by PackageID having count(*) < .75*@possible_DPs)
group by CAS.AdvertisementID, Ads.AdvertisementName, Ads.PackageID
having count(*) > .25*@client_count order by 5

It's not the simplest SQL query, but if you follow it line by line you'll see it's not 'rocket science' either. Basically it finds advertisements that are applicable to more than one querter of your clients and that are missing on at least one quarter of your DPs. My thinking is that if it's applicable to more than one quarter of your clients, then it is probably applicable at most of your sites. And if it's missing on more than one quarter of your DPs, then it's missing on enough to be painful. In truth, both statements could be wrong. If I solved this problem with vbscript I could account for both issues (and other issues, such as expiration), but it would be a complicated vbscript. And in this case it's good enough to give a short list of suspect packages which can quickly be manually reviewed.

(On the other hand, it's got two kinds of subselects - how often do you see that? ;-) And a HAVING clause, variables, and ordering by a column number. So it's a good chance to learn a bit more about SQL, for those that are new to SQL)

Update (4/25/07):

I came up with a better SQL query for the same purpose. The previous query is good to ensure that enough DPs are available for the anticipated workload, but this one is even better for finding sites where the clients will send the location requests that will be unanswered indefinitely. It's sorted so that the worst site / advertisement combinations are at the top of the list, so you can deal with the worst offenders first. And it's actually a simpler query so easier to understand.

select CAS.AdvertisementID, AdvertisementName, PackageID, SMS_Assigned_Sites0, count(*) 'targeted clients'
from v_ClientAdvertisementStatus CAS join v_RA_System_SMSAssignedSites ass on CAS.resourceID=ass.resourceID join v_Advertisement Ads on CAS.AdvertisementID=Ads.AdvertisementID
where AssignedScheduleEnabled<>0 and ExpirationTimeEnabled=0 and
    SMS_Assigned_Sites0 not in
    (select SiteCode from v_DistributionPoint where PackageID=Ads.PackageID)
group by CAS.AdvertisementID, AdvertisementName, PackageID, SMS_Assigned_Sites0
order by 5 desc

Summary: most questions that I get that can be answered with SQL queries are 'now' questions - how many Vista clients do we have at this point in time? How many SCCM v4 Beta 2? How many are online? But sometimes we need to know when data (reflecting activity) most recently changed. I give two examples: a specific problem I solved today, and a solution I developed years ago but that is still useful to all of us today.

Here's a teaser (the output of the last query in this posting - it's hardware inventory loading workload over an average week (notice how it corresponds with business hours quite well, and is especially heavy on Thursdays)):

hardware inventory workload over an average week

Today's example is that we recently moved a lot of clients from one SCCM site to another. An attentive internal customer (don't you hate those guys? Wink) noticed that we had more clients at the source site than the central site was reporting for that site, and he didn't understand why we would have that difference. For us SMS guys it makes sense, but I can see how it would be non-intuitive. I explained the theory, but I don't expect people to just take my word for it - some analysis of the data that proves the point reassures them and increases my credibility. So I like to prove that my theory is backed up by the data.

In this case, I wrote a query that found all the clients that had been moved (that were at the child site and not at the central site), and got the latest discovery data (DDR) dates for those clients. Then I got the months and days for all that data (i.e. ignoring the hours and minutes, since I didn't need the data to be that granular), grouped them by date to get the count by date, copied that data to Excel, and did a quick graph of the numbers. The graph demonstrated that the excess clients at the child site had indeed not seen data from those clients since we had moved the clients. So these excess clients were just remnants from the past, would be purged when they had been idle for 30 days, and could indeed be ignored in the meantime (as we normally do). [all of this took an hour, so it's not triival, but it's not overly complex either]

The key point of this effort was the graph that proved that the clients had been not been heard from at the child site since they had been moved elsewhere. The Y axis of the graph was client counts (clients that last reported DDR activity) and the X axis was the dates for the last 30 days (when the clients had reported the DDR activity). Getting that X axis is the real trick.

That was done using the following query. It's a bit complex, but the key point is the "datepart" functions in the first line. For all the records we record just the month and day (into the #temp temporary table) - we don't record the hours, minutes, and seconds, which we would normally get from "agenttime" (in this case). So we get lots of duplicates for each day - as many as clients that reported during that day. The rest of the query gets the discovery data but ignores data from any AD discovery methods or from DDM itself (which would be server-side discovery methods like 'inventory discovery').

select datepart(month,agenttime) 'month', datepart(day,agenttime) 'day' into #temp from <source site server>.SMS_<source site sitecode>.dbo.v_R_System sys

join (select ResourceId, MAX(AgentTime) as AgentTime from <source site server>.SMS_<source site sitecode>.dbo.v_AgentDiscoveries where agentname<>'SMS Discovery Data Manager' AND agentname not like '%!_AD!_System%' ESCAPE'!' group by ResourceId) disc on disc.resourceid=sys.resourceid

where client0=1 and obsolete0=0 and name0 not in

(select distinct name0 from <central site server>.SMS_<central site sitecode>.dbo.v_R_System sys join <central site server>.SMS_<central site sitecode>.dbo.v_RA_System_SMSAssignedSites ass on sys.resourceId=ass.resourceID

where client0=1 and obsolete0=0 and SMS_Assigned_Sites0='<central site sitecode>')

So now we just have to get the count of clients per day, and that can be done with this query:

select month, day, count(*) from #temp group by month, day order by month, day

The results are cut and paste to Excel and then you can easily do a graph to show the history. 

That particular example is very specific to a rare problem and so in itself isn't going to be too useful to you. But I hope you followed the problem and the solution. You will likely have similar problems and thus can use similar solutions. The "key points" are most likely to be useful to you - group data by date and then you can graph activity levels over time. If you want to break it down by hour, day of the month, or similar breakdowns, the same general principles apply.

A similar (and more broadly useful) example is this query:

select substring('SunMonTueWedThuFriSat',(DATEPART(dw, TimeStamp)-1)*3+1,3) + str(datepart(hh,TimeStamp)), COUNT(*) from v_GS_Workstation_Status group by DATEPART(dw, TimeStamp),datepart(hh,TimeStamp),substring('SunMonTueWedThuFriSat',(DATEPART(dw, TimeStamp)-1)*3+1,3) + str(datepart(hh,TimeStamp)) order by DATEPART(dw, TimeStamp),datepart(hh,TimeStamp)

This one shows hardware inventory load times by day of the week, in 3 hour blocks for each day. In a perfect world inventory would flow in evenly throughout the week so that the server is never overly stressed. But in reality it will come in spikes of activity, depending on when the clients were first installed and when the clients are online. In our case that corresponds very nicely with Redmond workday hours.

p.s. Why does moving clients from one child site to another child site cause a discrepency between the central site and source site in client counts? The basic answer is that when clients move away from a site, they don't say Goodbye to the source site. They do say Hello to the destination site (sooner or later, when they do a heartbeat discovery).

The source site will purge the old client records when they run the relevant SMS maintenance task (such as Delete Inactive Discovery Data). But that can be quite awhile (30 days in my case) and so in the meantime the clients can be counted in both sites if you count them at the child site level. The central site doesn't have that problem because as soon as the client reports to the destination child site, the child site will forward the DDR to the central site, and that DDR will indicate that the client is assigned to the destination child site. So the central site 'forgets' that the client was assigned to the source site and now knows that it is assigned to the destination site, which is correct. So the central site's perspective is more accurate than the child sites' perspective.

Posted by pthomsen | 1 comment(s)
Filed under:

Summary: in the last week of March, at MMS 2007, we talked about a huge number of subjects. Some needed some follow-up, and I'll use this posting to give general updates (personal updates have been sent via e-mails).

A new update as of 4/9/2007:

If you attended the SY12 session on SCCM security (native mode especially) and Internet-based Computer Management (IBCM) you might remember a slide where 3 technical papers were announced for using PKI with SCCM. That sounded wonderful. I'm sure I'm not the only one to hear concerns from people that setting up PKI for IBCM is going to be tricky. The clarification is that the material for those documents is already incorporated into the SCCM core documentation, so those papers aren't needed. See the SMS writer's blog for good pointers to that material.

Previous updates (as of 4/2/2007):

There was some confusion about the availability of the Configuration Manager SDK Beta 1 (not to be confused with SCCM Beta 1 - the product and its SDK are on different release cycles). SDK B1 is not an open Beta release. It's restricted to the following audience:

-          TAP Customers
-          TAP Partners
-          Microsoft Consulting Services Management Champions
-          MVPs

SCCM SDK B2 will be publicly available.

Another concern was about ITMU v3 client-side performance issues. ITMU perf issues were a hot issue last summer, and are well documented in KB 916 089. I've confirmed that the product team has had few reports of performance issues with ITMU v3, and they have all turned out to be known issues, as per the KB. So please do work with CSS (formerly PSS) to confirm that the usual fixes have been successfully applied. If they do confirm issues then they are well position to raise the issue with the product team and ensure that any additional needed fixes are provided.