July 2008 - Posts

and the “State of the Nation” presentation video is not there. The one thing I missed due to scheduling that I was absolutely counting on the DVD to see.

I guess I’ll be re-arranging my scheduling priorities for next year.

Posted by pwstrain | with no comments

Jumping off the Microsoft train for a post.

We use Cacti for circuit / router / switch monitoring. It's free, it does everything we need it to do, and it's easy to get going if you use the CactiEZ bootable distro.

So after some minor hiccups involving an ill-advised upgrade and other technician - induced error, Cacti has been humming happily along for months. It had drive space issues for a while, but that was easily resolved by adding another volume to the VM, mounting it, and moving over the MySQL and .rrd files. No mess, no fuss.

I came in this morning to the dreaded "Cacti is down" email message from one of our techs. My first thought was disk space. Occasionally someone will set the Cacti log file to verbose and forget to turn it back. This was not the case.

Mysqld was not running and would not start. However the error was not about disk space, but about the errmsg.sys file not containing the correct number of errors. This was a new one.

While searching around the installation I noticed that all of my RRD files were gone. Kaput. MIA. A quick check of the nightly backups showed that they were missing from the last three nightlies as well.

Here's what happened. The YUM installation in the Cacti-EZ distro is set to automatically update itself periodically. On July 2 it upgraded the version of MySQL. At that point MySQL would not start because the upgrade process failed to update the errmsg.sys file that was in the new partition where I moved the MYSQL data to.

As part of it's cleanup process, Cacti deletes RRD files that are over X days old. In my environment I had this set to two. This is to prevent RRD files from nodes that no longer exist from clogging your system. When MySQL quit, Cacti stopped updating the RRD files. As soon as they had a date that was over two days old, Cacti dutifully deleted them all.

I was able to copy the errmsg.sys from another location (/usr/share/mysql/english) and get MySQL started again. I restored the last good copy of the graphs, leaving a few days gap between when it quit and when I got it working.

Lessons learned? Automatic is great, until it automatically breaks things. I've set the RRD remove period to a month and changed the number of backups that I keep "live" on our SAN to five. If this happens again I'll consider a nightly copy of the errmsg.sys from the location where I know it will get updated correctly.

Posted by pwstrain | with no comments
Filed under:

So I last posted about creating a collection based on a user's membership in an AD Group, and tying that to a workstation based on the TopConsoleUser field.

Of course as soon as I let it loose that we now had that functionality, our group wanted to be able to run reports against it. "No problem!", I thought. Just translate the WQL to SQL and everything will be fabulous. Long story short, after several hours spread over a couple of days I threw in the towel. The WQL just does not line up in any way with SQL; the process where we actually match userid against AD Group is somehow hidden behind the curtain.editions-std

A quick call and our SQL guru comes to the rescue. After trying to parse it and also being unable to figure it out, we ran a SQL Profiler capture when executing the following WQL query:

select distinct UniqueUserName from  SMS_R_User where UserGroupName = "<domain>\\<GroupA>"

After some searching, we found the following query in the Profiler trace:

select distinct Name0 from v_R_System INNER JOIN v_GS_SYSTEM_CONSOLE_USAGE on v_GS_SYSTEM_CONSOLE_USAGE.ResourceID = v_R_System.ResourceID where v_GS_SYSTEM_CONSOLE_USAGE.TopConsoleUser0 in (select  distinct SMS_R_User.Unique_User_Name0 from User_DISC AS SMS_R_User LEFT OUTER JOIN User_User_Group_Name_ARR AS __erUser_User_Group_Name_ARR0 ON SMS_R_User.ItemKey = __erUser_User_Group_Name_ARR0.ItemKey   where __erUser_User_Group_Name_ARR0.User_Group_Name0 = '<domain>\<GroupA>')

A bit more  complex, yes? Turns out WQL is hiding some rather hinky SQL stuff from us, including tables we're not used to dealing with. The table dbo.User_User_Group_Name_ARR0 table appears to be where we actually map user names to AD Groups, by means of the ItemKey field.

In order to get this to return the workstation ID using the TopConsoleUser field, I turned it into this query:

select distinct Name0 from v_R_System INNER JOIN v_GS_SYSTEM_CONSOLE_USAGE on v_GS_SYSTEM_CONSOLE_USAGE.ResourceID = v_R_System.ResourceID where v_GS_SYSTEM_CONSOLE_USAGE.TopConsoleUser0 in (select  distinct SMS_R_User.Unique_User_Name0 from User_DISC AS SMS_R_User LEFT OUTER JOIN User_User_Group_Name_ARR AS __erUser_User_Group_Name_ARR0 ON SMS_R_User.ItemKey = __erUser_User_Group_Name_ARR0.ItemKey   where __erUser_User_Group_Name_ARR0.User_Group_Name0 = '<domain>\<GroupA>')

So now by replacing <domain>\<GroupA>' with pertinent info, we get the report that people are clamoring for. I may even make it a variable, with choices that auto-populate from the table. Yeah, that's it!

Now granted you have to take care of a couple of permissions issues on the 'new' tables (specifically  dbo.User_User_Group_Name_ARR0 and dbo.User_DISC) to get them to appear in the web console view. Granting "Select" on these was not a concern for me, so I went ahead. Your shop my be a bit more touchy about that requirement.

Again this was SMS 2003, SP3, Asset Intelligence deployed. YMMV.

Posted by pwstrain | with no comments
Filed under: