THE PROBLEM
Sometimes your SMS infrastructure seems to be swimming right along and everything is going right and then something you didn't think too much about happens. You start getting alerts that the disks are full or the backups are failing because of low disk space and you're wondering what the heck is going on. A cursory look at the disks shows there's nothing going on other than the SMS DB getting larger. You've got maintenance tasks scheduled weekly or daily that delete out old records, you haven't changed the inventory frequency, you haven't turned on lots of new inventory items, you're not aware of anything that should be causing this.
OK, so now what? You can go from table to table in SQL Enterprise Manager and manually see how big the tables are or you can run a SQL script to see how big the tables are, but what does that tell you? Do you have a baseline of table sizes? Do you know which tables are SUPPOSED to be large in SMS? And what IS large for your environment? Large at a Fortune 50 company is much different than at, say a family run construction firm in Golden Valley. Oh, and are you tracking growth, not just table size? The largest tables might not be the tables that are growing fastest and causing you problems today.
THE SOLUTION
(NOTE: because my posts have the potential of reaching a pretty wide group of people with a wide range of skills, I try not to make any assumptions as to how much you know about the topics I write about. I will keep it challenging for the geeks out there, but I will also try to give as much background and walkthrough as is reasonable for those of you who don't know this stuff. So if you're a wiz, don't get all worked up that I'm spelling everything out. I wish someone would have done that for me sometimes, so I'm doing it now. It can only help)
You need to start tracking table sizes and row counts over time. I'm sure there's all kinds of tools out there you can pay for that will do this for you. But if you're a stingy ***, or you just want to roll your own, here's what you can do.
(If you're a cheater, here's a SHORTCUT TO THE CODE)
You only need a SQL job, a table, and a user defined function (well, you could get by with a view, but let's be fancy).
Let's start by determining what information we want to get and create a query to get it.
So, to me it seems at the very minimum we need to keep track of
- The table ID for each table in the SMS database
- The table names (I don't want to have to join to get the table name later on)
- The row counts
- The total bytes consumed by the table - which is more than just the sum of the data size, and requires us to also get:
- Data Bytes
- Index Bytes
- Unused (freespace) Bytes
- And to keep track of the sizes over time, we'll need to store the time.
I don't like reinventing the wheel, so I went on to Google and copied someone elses script for determining table sizes in SQL. I don't remember where I got it, but I'm pretty sure it's one of these sites. Forgive me if it was your code and you feel like you didn't get credit...it looks like most of the code came from a MS stored procedure anyway
. Whatever the case, I use BIGINT instead of INT so it didn't fail on HUGE tables and now it uses table variables instead of temp tables. Here's what I came up with (again, mostly pilfered code):
SNIPPET #1BEGIN
DECLARE @id INT
DECLARE @type CHARACTER(2)
DECLARE @pages BIGINT
DECLARE @dbname SYSNAME
DECLARE @dbsize DEC(15,0)
DECLARE @bytesperpage DEC(15,0)
DECLARE @pagesperMB DEC(15,0)
DECLARE @low INT
-- DECLARE A TABLE VARIABLE TO HOLD THE RESULTS. BECAUSE WE'RE NOT TALKING ABOUT MASSIVE
-- AMOUNTS OF DATA, IT'S MUCH BETTER TO HAVE IN A TEMP TABLE VARIABLE BECAUSE THAT (USUALLY) PUTS
-- THE INTERMEDIATE RESULTS ALL IN RAM. IF YOU USE A TEMP TABLE THEN IT WRITES TO THE
-- TEMPDB AND/OR TEMPDB LOG
DECLARE @tmpTblSpace TABLE
(
objid INT NULL,
rows BIGINT NULL,
reserved DEC(15) NULL,
data DEC(15) NULL,
indexp DEC(15) NULL,
unused DEC(15) NULL
)
SET NOCOUNT ON
-- Create a cursor to loop through the user tables
DECLARE c_tables CURSOR FOR
SELECT id
FROM sysobjects
WHERE xtype = 'U'
-- NOW OPEN THE CURSOR, LOOP THROUGH THE TABLES AND INSERT INTO
-- THE TEMPORARY TABLE VARIABLE.
OPEN c_tables
FETCH NEXT FROM c_tables
INTO @id
WHILE @@FETCH_STATUS = 0
BEGIN
/* Code stolen from the sp_spaceused proc from MSSQL */
INSERT INTO @tmpTblSpace
(objid,
reserved)
SELECT objid = @id,
SUM(reserved)
FROM sysindexes
WHERE indid IN (0,1,255)
AND id = @id
SELECT @pages = SUM(dpages)
FROM sysindexes
WHERE indid < 2
AND id = @id
SELECT @pages = @pages + Isnull(SUM(used),0)
FROM sysindexes
WHERE indid = 255
AND id = @id
UPDATE @tmpTblSpace
SET data = @pages
WHERE objid = @id
/* index: sum(used) where indid in (0, 1, 255) - data */
UPDATE @tmpTblSpace
SET indexp = (SELECT SUM(used)
FROM sysindexes
WHERE indid IN (0,1,255)
AND id = @id) - data
WHERE objid = @id
/* unused: sum(reserved) - sum(used) where indid in (0, 1, 255) */
UPDATE @tmpTblSpace
SET unused = reserved - (SELECT SUM(used)
FROM sysindexes
WHERE indid IN (0,1,255)
AND id = @id)
WHERE objid = @id
UPDATE @tmpTblSpace
SET rows = i.rows
FROM sysindexes i
WHERE i.indid < 2
AND i.id = @id
AND objid = @id
FETCH NEXT FROM c_tables
INTO @id
END
SELECT @low = d.low
FROM master.dbo.spt_values d
WHERE d.NUMBER = 1
AND d.TYPE = 'E'
-- NOW THAT WE'VE COMPILED THE RESULTS, SELECT THEM FROM THE TABLE (VARIABLE) FOR ALL TO SEE.
-- (RESULTS ARE IN BYTES)
SELECT GetDate() TimeKey
,so.id Table_ID
,so.name Table_Name
,CONVERT(BIGINT,spc.rows) Row_Count
,CONVERT(BIGINT,Ltrim(Str(spc.reserved * @low,15,0))) Reserved_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.data * @low,15,0))) Data_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.indexp * @low,15,0))) Index_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.unused * @low,15,0))) Unused_Bytes
FROM @tmpTblSpace spc
INNER JOIN sysobjects so ON spc.objid = so.id
-- CLEANUP THE CURSOR
CLOSE c_tables
DEALLOCATE c_tablesEND
SAMPLE QUERY OUTPUT #1
Cool! That's about all the data we need! The important columns are Table_Name, Row_Count & Reserved_Bytes (which is data+index+unused and represents the full size of the table) but we'll put all of these columns in our table in case we need them.
Now that we have the query, let's create a table that will hold this data for long-term tracking
I know, it feels like cheating, but seems to me that the easiest way to create a table is to SELECT <stuff> INTO <table>. So, let's temporarily add one line to SNIPPET #1 above so we can create the table:
SNIPPET #2 - (THE VERY END OF SNIPPET #1)
-- NOW THAT WE'VE COMPILED THE RESULTS, SELECT THEM FROM THE TABLE (VARIABLE) FOR ALL TO SEE.
-- (RESULTS ARE IN BYTES)
SELECT GetDate() TimeKey
,so.id Table_ID
,so.name Table_Name
,CONVERT(BIGINT,spc.rows) Row_Count
,CONVERT(BIGINT,Ltrim(Str(spc.reserved * @low,15,0))) Reserved_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.data * @low,15,0))) Data_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.indexp * @low,15,0))) Index_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.unused * @low,15,0))) Unused_Bytes
INTO dbo._SDE_TableSizeTracking
FROM @tmpTblSpace spc
INNER JOIN sysobjects so ON spc.objid = so.id
-- CLEANUP THE CURSOR
CLOSE c_tables
DEALLOCATE c_tables
END
Now, just run it once and the new table will be created with all the appropriate data types. Man, feels like cheating. Gotta be some slow GUI way to do it :)
Now that we have the query and the table, let's turn the query into a SQL job so it can be regularly scheduled to get data into the new table
First, let's turn that last bit of code in SNIPPET #2 from a SELECT <stuff> INTO <table> (which creates a table) to an INSERT INTO <table> SELECT <stuff> (which appends to a table)
SNIPPET #3
-- NOW THAT WE'VE COMPILED THE RESULTS, SELECT THEM FROM THE TABLE (VARIABLE) FOR ALL TO SEE.
-- (RESULTS ARE IN BYTES)
INSERT INTO dbo._SDE_TableSizeTracking
SELECT GetDate() TimeKey
,so.id Table_ID
,so.name Table_Name
,CONVERT(BIGINT,spc.rows) Row_Count
,CONVERT(BIGINT,Ltrim(Str(spc.reserved * @low,15,0))) Reserved_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.data * @low,15,0))) Data_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.indexp * @low,15,0))) Index_Bytes
,CONVERT(BIGINT,Ltrim(Str(spc.unused * @low,15,0))) Unused_Bytes
FROM @tmpTblSpace spc
INNER JOIN sysobjects so ON spc.objid = so.id
-- CLEANUP THE CURSOR
CLOSE c_tables
DEALLOCATE c_tables
END
Now lets take this new SQL code (essentially SNIPPET #1 with the highlighted yellow line from SNIPPET #3 added) and create the SQL job.
From SQL Enterprise manager, right click on SQL Server Agent -> Jobs and select New Job...

Give the new job a name, owner, and category and then click on STEPS

On STEPS click on NEW
![]()

Enter a Step name, Type = Transacti-SQL script (T-SQL), database, and then paste the new SQL code in the COMMAND box.

Now it's your job to set the schedule you want and finish this dialog. When you're done you'll have a new SQL job under SQL Server Agent with the name you called it.
My suggestion for the schedule: There are a lot of inserts, updates and deletes going on at all times on a normal SMS server, things come in, they fill tables for a short time, and then they get deleted or moved to other tables. If you set this too short (15 minutes?), you'll capture a whole lot of useless ups and downs. But if the server is barfing and you need to find out as soon as is reasonable what's growing/shrinking, I'd set it to about every hour. By the end of the day you'll have the start of some really good information. Keep in mind, you might not get really good information for a few days and keep watch of your maintenance tasks, if you have maintenance tasks set weekly to delete old records, almost all tables are going to naturally grow during the week and then drop after the maintenance tasks. Be aware of that so you're not chasing table growth that's normal and will be deleted after the week.
I wouldn't be too worried about how much disk space the new size tracking table takes up, it's just a bunch of INT/BIGINT columns, except for the table_name, so it won't accumulate data very fast or get too big too fast. It will gain about 36MB per month if scheduled once every hour, so that's what we do. It's probably prudent to create another SQL job to clean out records after a certain amount of time. I'll leave that for your homework.
When you've got that all done, manually run it once (right click on the job and click START JOB ON STEP...) This will populate the table will some data. Some time will have to pass and this will need to be run a few times before we can get meaningful data.
At this point we've got a table and we've got a scheduled job to populate that table.
Let's create the UDF to return the data more meaningfully
Now we could create a view to return the data we're looking for, but I'm thinking user-defined function (UDF). Specifically, I'm thinking of a Multistatement Table-Valued function.
Quickly, let's determine what a meaningful display of this data would look like so I can help you understand why I think a UDF would be best:
We've got a table with a bunch of sizes and row counts and the time those numbers were recorded. The following data would make this more meaningful to me:
- Size of each table when we first started collecting
- The row count for the same time period too
- Size of each table for the last collection
- The row count for the last collection
- The growth (or shrinkage) in bytes of each table between the start and end
- The growth (or shrinkage) in rows of each table between the start and end
- I'd also like the ability to specify a start and end date in case I only want to see the difference for a single hour, day, week, month, etc.
In deciding view or function, I think about the data we need and what it would take to get it:
TABLE #1
- We need a query to pull the MIN TimeKey so we can get sizes at the beginning of collection
select MIN(timeKey) from dbo._SDE_TableSizeTracking
- We need a query to pull the MAX TimeKey so we can get sizes at the last collection
select MAX(timeKey) from dbo._SDE_TableSizeTracking
- We need a query to pull the sizes at the beginning
SELECT Table_Name AS Table_Name,
Data_Bytes AS StartBytes,
Row_Count AS StartRows
FROM dbo._SDE_TableSizeTracking
WHERE TimeKey = <that min timekey from 1>
- We need a query to pull the sizes at the end
SELECT Table_Name AS Table_Name,
Data_Bytes AS StartBytes,
Row_Count AS StartRows
FROM dbo._SDE_TableSizeTracking
WHERE TimeKey = <that max timekey from 2>
- Then we need to link 3 & 4 together based on table_name so we can get all the data on one row and calculate growth.
SELECT tblStart.Table_Name AS TableName,
tblEnd.EndBytes - tblStart.StartBytes AS GrowthBytes,
tblEnd.EndRows - tblStart.StartRows AS GrowthRows,
tblEnd.EndBytes AS CurrentBytes,
tblStart.StartBytes AS StartBytes,
tblEnd.EndRows AS CurrentRows,
tblStart.StartRows AS StartRows
FROM <the beginning stuff in 3>
INNER JOIN <the ending stuff in 4>
ON 3.table_Name = 4.table_name
So if we were to take all of that stuff and make a view out of it, it would look a little something like this (the colors indicate which step it comes from above):
SNIPPET #1
SELECT tblStart.Table_Name AS TableName,
tblEnd.EndBytes - tblStart.StartBytes AS GrowthBytes,
tblEnd.EndRows - tblStart.StartRows AS GrowthRows,
tblEnd.EndBytes AS CurrentBytes,
tblStart.StartBytes, tblEnd.EndRows AS CurrentRows,
tblStart.StartRows
FROM
(SELECT Table_Name,
Data_Bytes AS StartBytes,
Row_Count AS StartRows
FROM dbo._SDE_TableSizeTracking AS ts
WHERE TimeKey IN (SELECT MIN(TimeKey) AS MinTime FROM dbo._SDE_TableSizeTracking)) AS tblStart
INNER JOIN
(SELECT Table_Name,
Data_Bytes AS EndBytes,
Row_Count AS EndRows
FROM dbo._SDE_TableSizeTracking AS ts
WHERE TimeKey IN (SELECT MAX(TimeKey) AS MaxTime FROM dbo._SDE_TableSizeTracking)) AS tblEnd
ON tblStart.Table_Name = tblEnd.Table_Name
Which I guess is OK. If you run this query by itself, it works -- its a query which you can turn into a view to pull growth data for the whole period you've collected. So if you're fine with it, I'm fine with it.
Now I personally don't like the IN statements WHERE TimeKey IN (SELECT... so I'm thinking I want to declare a variable to hold those values. Can't do that in a view, and more importantly, we can't specify a particular time range by passing parameters to this as a view. That right there tells me to use a function or procedure. The choices are between an Inline table-valued function and a Multistatement Table-Valued function or a stored procedure. Well, an inline table-valued function is out because I envision multiple statements, and variable declarations and an inline table-valued function can only have a single select statement in it. A stored procedure would work, but to use the data you get back from a stored procedure you need to put it into a temp table or table variable first and that's not as sexy as a multi-statement table-valued function. So, let's use that. I'm not envisioning anything too complicated that would force us to use a stored procedure.
So let's use the same list of stuff in TABLE #1 above to create a function.
A multi-statement function has the following format:
CREATE FUNCTION dbo.udf_TableTracker
(
-- Add the parameters for the function here
)
RETURNS
@RetVar TABLE
(
-- Add the column definitions for the output TABLE variable here
)
AS
BEGIN
-- Code here - insert rows into @RetVar when done
RETURN
END
Let's use the same junk from Table #1 to fill out this function template (stuff in RED is extra supporting stuff not directly from table #1):
SNIPPET #2
CREATE FUNCTION dbo.udf_TableTracker
(
-- Add the parameters for the function here later
)
RETURNS
@RetVal TABLE
(
-- TABLE DEFINITION BASED ON THE COLUMNS WE'RE SELECTING
TableName varchar(64),
GrowthBytes bigint,
GrowthRows bigint,
CurrentBytes bigint,
StartBytes bigint,
CurrentRows bigint,
StartRows bigint
)
AS
BEGIN
DECLARE @MINDATE DATETIME
DECLARE @MAXDATE DATETIME
SELECT @MINDATE = MIN(TimeKey),
@MAXDATE = MAX(TimeKey)
FROM dbo._SDE_TableSizeTracking
-- DATA IS OUTPUT BY INSERTING DATA INTO FUNCTION'S RETURN VARIABLE ABOVE
INSERT INTO @RetVal
SELECT tblStart.Table_Name AS TableName,
tblEnd.EndBytes - tblStart.StartBytes AS GrowthBytes,
tblEnd.EndRows - tblStart.StartRows AS GrowthRows,
tblEnd.EndBytes AS CurrentBytes,
tblStart.StartBytes AS StartBytes,
tblEnd.EndRows AS CurrentRows,
tblStart.StartRows AS StartRows
FROM
(SELECT Table_Name,
Data_Bytes AS StartBytes,
Row_Count AS StartRows
FROM dbo._SDE_TableSizeTracking AS ts
WHERE TimeKey = @MINDATE) AS tblStart
INNER JOIN
(SELECT Table_Name,
Data_Bytes AS EndBytes,
Row_Count AS EndRows
FROM dbo._SDE_TableSizeTracking AS ts
WHERE TimeKey =@MAXDATE) AS tblEnd
ON tblStart.Table_Name = tblEnd.Table_Name
RETURN
END
That's should facilitate everything that was in the query from snippet #1. Let's make sure it all makes sense to you before we finish it:
- Define the table output from this function
- Declare some variables to hold the min and max timekey
- Assign the min and max timekeys to those variables
- Select the data from the query in snippet #1 using the new @mindate and @maxdate
- Insert that data into the @RetVal variable which will be output as a table when the function is called.
Almost done. Right now if you executed this it would create the function and you could select from it, but it wouldn't buy us more than the view. So let's add a PERIODSTART and PERIODEND parameter and modify the code to use them instead of just the min/max. And I'd like to have default values in case we don't want to put a start/end.
Let's finish the function with parameters and default values
SNIPPET #3
CREATE FUNCTION dbo.udf_TableTracker
(
@PERIODSTART DATETIME = NULL,
@PERIODEND DATETIME = NULL
)
RETURNS
@RetVal TABLE
(
-- TABLE DEFINITION BASED ON THE COLUMNS WE'RE SELECTING
TableName varchar(64),
GrowthBytes bigint,
GrowthRows bigint,
CurrentBytes bigint,
StartBytes bigint,
CurrentRows bigint,
StartRows bigint
)
AS
BEGIN
DECLARE @MINDATE DATETIME
DECLARE @MAXDATE DATETIME
SELECT @MINDATE = MIN(COALESCE(@PERIODSTART,TimeKey)),
@MAXDATE = MAX(COALESCE(@PERIODEND,TimeKey))
FROM dbo._SDE_TableSizeTracking
-- DATA IS OUTPUT BY INSERTING DATA INTO FUNCTION'S RETURN VARIABLE ABOVE
INSERT INTO @RetVal
SELECT tblStart.Table_Name AS TableName,
tblEnd.EndBytes - tblStart.StartBytes AS GrowthBytes,
tblEnd.EndRows - tblStart.StartRows AS GrowthRows,
tblEnd.EndBytes AS CurrentBytes,
tblStart.StartBytes AS StartBytes,
tblEnd.EndRows AS CurrentRows,
tblStart.StartRows AS StartRows
FROM
(SELECT Table_Name,
Data_Bytes AS StartBytes,
Row_Count AS StartRows
FROM dbo._SDE_TableSizeTracking AS ts
WHERE TimeKey = @MINDATE) AS tblStart
INNER JOIN
(SELECT Table_Name,
Data_Bytes AS EndBytes,
Row_Count AS EndRows
FROM dbo._SDE_TableSizeTracking AS ts
WHERE TimeKey =@MAXDATE) AS tblEnd
ON tblStart.Table_Name = tblEnd.Table_Name
RETURN
END
You gotta appreciate how cool that is: Add the parameters, give them a default value of NULL, and put them in a COALESCE function with the timekey. That's it.
TEACHING MOMENT: COALESCE isn't a particularly sexy function and many people don't trip over it in their everyday use. Of all the SQL classes I've ever taken, I don't think I've ever learned about COALESCE, so I'm going to explain how it's working for us for those that haven't seen it.
COALESCE will essentially return the first NON-NULL number in a list of numbers (there's more to it, but that's enough for now). So if you don't pass any parameters to our UDF, it will default them to NULL. So when we get to the COALESCE, it will return the MIN/MAX timekey instead. However, if we pass in a PERIODSTART or PERIODEND parameter to our UDF, they will be NON-NULL and because they are the first parameter to the COALESCE function, they will be returned instead of the MIN/MAX timekey.
Execute the code in SNIPPET #3 to create the UDF so we can call it.
So execute the code in snippet #3 and it will create the UDF.
Now we can select from the udf like you would select from a table.
IMPORTANT NOTE: Unlike a stored procedure, you can't leave the parameters empty to use the default parameter values. Instead use the word DEFAULT with no quotes.
Let's try it:
SNIPPET #4
select * from
dbo.udf_TableTracker(DEFAULT,DEFAULT)
SAMPLE OUTPUT #1

Because the output of that function is a table, you treat it just like it was a table. You can order by columns, perform group by, add a WHERE clause to filter your results, etc. You can even get an execution plan (CTRL+L in Enterprise Manager) - I've got a big monitor, this might be a little wide for you, but check it all out:
EXECUTION PLAN #1

Well, the cost is only 11 and it only takes a second to run, but as more data fills that table it'll be more important to get indexes on that table. So let's do that quickly.
I'll use the code instead of the GUI. Let's create a unique clustered index using the timeKey and the table_name. Remember, because it's a clustered index it will order the physical data by the columns you specify.
SNIPPET #5
CREATE UNIQUE CLUSTERED INDEX [_SDE_TableSizeTracking_PK] ON [dbo].[_SDE_TableSizeTracking]
(
[TimeKey] ASC,
[Table_Name] ASC
)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = ON,
IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF,
ONLINE = ON,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 100) ON [PRIMARY]
Now, one last look at the execution plan on snippet #4:
EXECUTION PLAN #2

Now that's more like it. Clustered index seek instead of full table scan and the cost is 3.79 instead of 11.22. I like it.
CONCLUSION + NEXT STEPS
I think we're done. Now we've got a table to hold table sizes and row counts, a SQL job to populate that table, a multi-statement table-valued function to efficiently and meaningfully pull that data, and a unique clustered index to speed it up.
- About the only things I'd add from here are some code in the UDF to make the range pulling a little better. Right now you have to know the exact timekey of the start or end period.
- Perhaps it would have been better to create a unique clustered primary key constraint instead of creating just the clustered index. When you create the primary key constraint, if there isn't a clustered index it will create one for you.
- Schedule - you may have to play around with the schedule to make it meaningful
- Deletion tasks - we didn't create anything to delete out old records out of that table after a period of time so this will just keep accumulating forever.
Hope that was informative! As always, I'm open to [constructive] criticizm, improvements, comments, questions, concerns, etc.
Number2 (John Nelson)
MyITForum - Forum Posts
MyITForum - Blog
