In yesterday’s post we pulled a Thomas Edison and showed one of 10,000 ways how not to write the number of records being processed using the ROW COUNT transformation.
We thought the process was straight forward, but it didn’t work. Our approach was:
1. Create a query that processed some records.
2. Send the result set of the query to the ROW COUNT transform, to count the rows.
3. Since the ROW COUNT transform populates a variable, use a DERIVED COLUMN transform to place the variable into a virtual table column.
4. Write to a flat file destination the value of the ROW COUNT variable.
The result set was a zero for every row that the original query generated. The reason for the zeros is because the ROW COUNT transform does not populate the SSIS variable with the row count until the DATA FLOW completes.
The second problem with the output was that we didn’t want 188 copies of the row count, we only wanted one. That means we need to find a way to process 188 records but write out the row count only once.
Today’s post will build on what we learned yesterday to create a package that will actually work. The trick of making the ROW COUNT transformation usable is to access the variable OUTSIDE of the DATA FLOW that populates it.
Here’s what the control flow will look like for our working package.
The general principal behind this is the first data flow will use the ROW COUNT transformation to populate a RowCount variable. The second data flow will use the local variable set by the ROW COUNT transform to write the value, one time, to a flat file.
In a production package the first data flow would probably write out the data of an extract file; the second data flow would be used to write out a control file, containing the number of rows expected in the data file and any other related information, such as the extraction date.
Remember, the ROW COUNT transform does not actually set the value of the variable until all the data has passed through. Hence, the need for the second data flow.
Now let’s take a look under the hood of each of the data flows.
DATA FLOW 1: Here’s what the first data flow looks like:
We re-used yesterday’s query, which is a query that selects about 2 percent of the rows from the AdventureWorks Sales.Customer table. Then we pump the result set into a ROW COUNT transformation that sets the user variable RowCount variable.
The configuration details of these two transformations were posted in yesterday’s Part I post.
One additional lesson that one might get from this… The ROW COUNT transform can be used as a terminator for your data flow without actually doing anything. It’s simple to setup and is an option when testing other data source functionality.
DATA FLOW 2: The second data flow is only slightly more interesting so I’ll go into slightly more detail on it’s configuration.
The DERIVED COLUMN and FLAT FILE DESTINATION transforms are literally cut and paste from yesterday’s single data flow approach that didn’t work. Nothing needed to be changed.
The concept behind this data flow is to map the RowCount user variable that was set AFTER the first data flow exited to a data column and write the value of that column to the output file.
At first I attempted to build this data flow without the OLE DB data source. Although the package ran, no data was written to the output file. The reason is because the DERIVED COLUMN transform needs some sort of data driver, even though we only want to utilize the RowCount variable.
In order to get the RowCount variable to be written out I prefixed the DERIVED COLUMN with an OLE DB transform. Here’s what’s in the OLE DB transform:
It doesn’t get any simpler than this. This simple query does two things:
1. Generates an input data stream that will allow the DERIVED COLUMN to fire.
2. Generates a single row so the ROW COUNT will only be written out once.
Just for completeness, here’s the configuration of the DERIVED COLUMN transform:
Again, it doesn’t get any simpler than that.
CONCLUSION This post shows how to use the ROW COUNT transformation, utilizing 2 distinct data flows, to write the row count variable out to a flat file. Although the focus has been on the RowCount variable, the use of the DERIVED COLUMN transform can be used to write ANY variable out to ANY destination.
In my next post I’ll show how to write the row count of a transform to a flat file without the use of the ROW COUNT transformation at all.
Happy SQL-ing!
This does not work, it will not let me pass a variable from one dataflow to another.
Disregard my last comment, I figured out that I had the scope wrong for my variable. This page was very helpful in helping me resolve an issue, thank you!