SQL Server 2005 introduced many new switches and features in T-SQL. Often times new features are met with “Why would I need to do that?”. The reality is that the SQL Server development team are not adding features for nothing.
One of my favorites is the OUTPUT clause. The OUTPUT clause can be used in conjunction with INSERT, UPDATE and DELETE statements. This post will show how I used the OUTPUT clause in a real world scenario with UPDATE.
Situation: While working on my data warehouse it was discovered that certain records were not being loaded into the warehouse because one of the lookup tables did not include one or more records that were required for an INNER JOIN. Since this has been going on over a period of several years it was considered important to reload those records.
Fortunately the source for the incoming data has been stored in a “staging” table so we have all of the original incoming data. One of the columns in the staging table is the load_date, which stores the date that the data was extracted from the source system.
Only a small number of records are affected each time the warehouse is loaded, but they are scattered over a period of several years of data. The load routine utilizes the load_date column to identify the records that need to be inserted or updated on the warehouse.
In order to utilize the existing load routine, i.e.., not have to craft a custom loader to identify the missing records, we would like to set the load_date of all of the missing records to the current date and run the load routine for the existing date.
Using the OUTPUT clause we will be able to update the missing stage records with a common load_date value and at the same time save the original load dates for the records in a separate table. We can then run the ETL to load the missing records. After the load successfully completes, we will restore the stage table to the original load dates.
Implementation Details: The first step is to create an output table where we will store 3 things: the primary key of the table, the original value of the column that is going to be updated, and the newly updated value of the updated column (optional but can be handy).
We want to save the primary key so we can conveniently do the update later after we’ve verified the results of the ETL load.
CREATE TABLE dbo.tmp_stg_output (stg_pkey_id int, load_date_before datetime, load_date_after datetime ); CREATE UNIQUE CLUSTERED INDEX idx_uc_output_clause ON dbo.tmp_stg_output (stg_pkey_id);
Next we’ll update the stage table with our desired load_date. We’ll also include the OUTPUT clause to save the data to our temp table.
UPDATE stg SET load_date = '03/12/2010' OUTPUT inserted.stg_pkey_id, deleted.load_date, inserted.load_date INTO tmp_stg_output FROM STG_table stg WHERE stg.section_code = 2213 AND stg.district_code = 197l;
This is just your run of the mill UPDATE statement except we have the OUTPUT clause that indicates the 3 columns to save to the tmp table. If you’re not familiar with the special tables INSERTED and DELETED check out BOL, Using the inserted and deleted Tables. My first encounter with these special tables were when I was creating triggers.
For the purpose of the use of the OUTPUT clause all we need to understand is that the INSERTED table contains the data that is changing. So in this example, the INSERTED.load_date will save the new load_date value. Also note, the INSERTED.stg_pkey_id could have been DELETED.stg_pkey_id. The primary key of the table doesn’t change so the INSERTED and DELETED tables will contain the same value for this column.
The DELETED.load_date will contain what was originally in the table before the update.
At this point we can run the load ETL to load our missing records. The details of the ETL are not relevant to this post…
Once we’ve validated we have a good load of our missing records we can restore our stage table to it’s original state by using our handy little temp table.
-- Run the load package using the load_date of 20100312 -- Verify the results are correct -- Set the stg table back to it's original state UPDATE stg SET load_date = t.load_date_before FROM dbo.STG_table stg INNER JOIN dbo.tmp_stg_output t ON t.stg_pkey_id = stg.stg_pkey_id;
Conclusion: Using the OUTPUT clause in conjunction with the UPDATE statement we were able to capture the state of our staging table, update one of it’s columns to allow us to easily perform an ETL load. Once the ETL finishes and is verified we restore the staging table to it’s original state using the temp table we created and populated with the OUTPUT clause.
Similar capabilities exist for INSERT and DELETE statements.
In our example we used an actual table that was created with the CREATE TABLE statement. We also could have just as easily used a temp table (#tmp_stg_output, ##tmp_stg_output), or a table variable (@tmp_stg_output). Obviously the table variable would not have given us the luxury of validating the data because the table is no longer available after the batch completes.
The risk of using #temp tables is that they are dropped when the session ends. If your session disconnects as a result of a network glitch the data in your temp output table would be lost.
Stay curious – keep learning…
Dave
Leave a Reply
Using Gravatars in the comments - get your own and be recognized!
XHTML: These are some of the tags you can use:
<a href=""> <b> <blockquote> <code> <em> <i> <strike> <strong>