Recently I’ve been working on performance issues relating to SELECT queries hitting some very large data warehouse stage tables. In one instance I encountered a table that had over 300M records where the SELECT statement that reads from the STAGE table had an estimated execution cost of 3500. I didn’t find this by accident… it was by far the longest part of the entire ETL process.
After doing some testing and validation in a development database I determined that I can get that query plan down do 3 just by re-indexing the table. In this case the primary key constraint was on the surrogate key, which didn’t help at all when reading recently added data from the table.
Since it is still a work in progress I’ll save the details of the re-index process for a later post, when I have better empirical data.
I also noticed that the data type for the date column ([load_date]) that is being used by the SELECT statement to identify the most recently added records to the data store was a full blown 8-byte datetime. The earliest date in the table only goes back to 2004 and the values stored in the table are just comprised of the month/day/year. In other words, if the earliest date is not less than 1900 and we don’t need the 3ms time precision, why not go with a smalldatetime datatype?
At 300M records in the table, and a 4 byte per record savings, a simple conversion of the datetime to smalldatetime datatype will save 1,200M, or 1.2G That’s just in the storage of this column. That doesn’t factor in any savings in indexes, backups, tempdb and memory needed to process queries, etc.
Obviously, the database is not SQL Server 2008, otherwise we could have gone with the DATE datatype, which uses 3 bytes instead of the smalldatetime’s 4.
The query to change the datatype of a column follows:
ALTER TABLE [dbo].[STG_my_table] ALTER COLUMN [load_date] smalldatetime NOT NULL;
This simple query accomplishes two objectives:
- Converts the [load_date] column from it’s existing 8-byte datetime to 4-byte smalldatetime.
- Makes the column not nullable.
Please reference SQL Server Books Online for the details on which columns can be changed using this technique. In general, if the data type converts with an implicit conversion it will work.
Also, be mindful that this query will require a large amount of transaction log space, it will take a long time for large tables (over 5 hours in the case of my 300M table) and will lock up the table making it unavailable to production users.
If you need to minimize the unavailability time of the table you’ll need to extract the data to a flat file, re-script the table, and reload the data from the flat file. Not exactly a painless process.
Of course, it’s always possible to change the column type through SQL Server Management Studio, but if the task is going to be repeated for several tables it’s much easier to just T-SQL it out and go to bed.
Happy SQL-ing!
Leave a Reply
Using Gravatars in the comments - get your own and be recognized!
XHTML: These are some of the tags you can use:
<a href=""> <b> <blockquote> <code> <em> <i> <strike> <strong>