Even though disk space is cheap it continues to the be the first resource to run out on a server. And once it runs out it’s a constant battle to keep the database size in check.
For sure there are things that can be done to keep database growth in control:
- Archive. I’ve seen databases where we have 6 years worth of data in it. Yet our contract with our client only required 4 years. That means we are storing 50 percent more data than we need to. Why don’t we set up an automatic archival process? Probably because we don’t have a framework that we can drop in place so we have to build it from scratch. With all of the day to day operational demands who has the time to build archive systems? That’s fodder for another post…
- Sliding Window Partitions. This is similar to archiving with a twist. It requires us to have a fairly sophisticated partitioning process in place where when new data comes rolling in we automatically roll the oldest data out of the table. Not only to maintain a consistent table size but in theory, that data that just “slid out of the table” is now going to archive storage (ie., slow and cheaper mediums). There would be more sliding windows deployed if there was a wizard driven process to create the partitions and all of the supporting functions. Since that is not the case, there will probably continue to be a bunch of opportunities for the consultant specializing in partitioning to speed up ETL and report queries.
- Junk/temp tables. We all do it so fess up. We’re on the production server and we create a “temp” table, except it’s not in tempDB. It’s a permanent table. Temp tables are usually easy to spot. The look like tmp_data_im_loading_toady, dgt_new_customer_fulfillments, temp_20120215, etc. You need to be a little careful deleting these tables because I have seen ETL process that actually rely on temp tables… so they’re really not temp are they. Identifying junk tables has saved us GIGs of space.
- Too many indexes. I’ve seen tables where the sum of the non-clustered indexes was three times the size of the data itself. If you come across this, chances are there’s something wrong. Not only are you wasting space to maintain these indexes, but ETL loads will need to update these indexes slowing down loads. The optimizer for basic query selections will have a ton of indexes to consider (and probably never use) making query plan generation that much more complex and costly.
Today’s query will list an instances tables for all user databases. Included in the result set are the size of the heap or clustered index (which is essentially the size of the data, since the index component of a clustered index is on the order of 1-2 percent of the index size). It also lists the number and aggregate size of non-clustered indexes.
Using this data you will be able to review your table structures, identify junk tables, and identify tables that probably have unnecessary non-clustered indexes. What you do with this information… All depends if you are swamped with operational fires or can actually take the time to clean house.
I’m including the code in this post but will also make it available in the downloads section in the sidebar of this post.
Stay curious; keep learning…