Table Info for Clustered Column Store Indexes

I’ve used a variation of this query for years and it worked pretty well, until clustered column store indexes came along.

The general problem is that the column store indexes store the storage in LOB records, although data records are also included.  This resulted in the tables with column store clustered indexes having twice the row count.

I am now working with Open Live Writer, and the plugins seem to be unavailable, yet…

 

 

WITH nci AS (    SELECT    i.[object_id]
                        , COUNT(i.[object_id]) AS [NumIndexes]
                        , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1073741824.0 AS NUMERIC(12, 4)) AS GBytes
                FROM        sys.indexes i
                        INNER JOIN sys.partitions p
                            ON    p.[object_id] = i.[object_id]
                            AND    p.index_id = i.index_id
                        INNER JOIN sys.allocation_units au
                            ON    au.container_id = p.partition_id
                WHERE    i.index_id = 2
                AND        i.[object_id] > 100
                GROUP BY i.[object_id] )

SELECT    db_name() AS DBName
        , i.object_id
        , OBJECT_NAME(i.object_id) AS [TableName]
        , i.name AS [ClusteredIndexName]
        , i.index_id
        , i.type_desc AS index_type
        , SUM(p.[rows]) AS row_cnt
        , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1073741824.0 AS NUMERIC(12, 4)) AS CI_GBytes
        , nci.[NumIndexes]
        , nci.GBytes AS [NCI_IndexGBytes]
        , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1073741824.0 AS NUMERIC(12, 4)) + nci.GBytes AS [TotalTableSize]
        , p.data_compression_desc
FROM        sys.indexes i
        INNER JOIN sys.partitions p
            ON    p.[object_id] = i.[object_id]
            AND    p.index_id = i.index_id
        INNER JOIN sys.allocation_units au
            ON    au.container_id = p.partition_id
            AND    au.[type] = 1        — IN_ROW_DATA, required for column store indexes, which also has LOB data.
        LEFT OUTER JOIN nci nci
            ON    nci.[object_id] = i.[object_id]
WHERE    i.index_id <= 1
AND        i.[type] in (0, 1)        — HEAP or CLUSTERED
AND        i.[object_id] > 100
GROUP BY i.[object_id]
        ,    i.name
        , i.index_id
        , i.type_desc
        , nci.NumIndexes
        , nci.GBytes
        , p.data_compression_desc

UNION

SELECT    db_name() AS DBName
        , i.object_id
        , OBJECT_NAME(i.object_id) AS [TableName]
        , i.name AS [ClusteredIndexName]
        , i.index_id
        , i.type_desc AS index_type
        , SUM(p.[rows]) AS row_cnt
        , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1073741824.0 AS NUMERIC(12, 4)) AS CI_GBytes
        , nci.[NumIndexes]
        , nci.GBytes AS [NCI_IndexGBytes]
        , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1073741824.0 AS NUMERIC(12, 4)) + nci.GBytes AS [TotalTableSize]
        , p.data_compression_desc
FROM        sys.indexes i
        INNER JOIN sys.partitions p
            ON    p.[object_id] = i.[object_id]
            AND    p.index_id = i.index_id
        INNER JOIN sys.allocation_units au
            ON    au.container_id = p.partition_id
            AND    au.[type] = 2        — IN_ROW_DATA, required for column store indexes, which also has LOB data.
        LEFT OUTER JOIN nci nci
            ON    nci.[object_id] = i.[object_id]
WHERE    i.index_id <= 1
AND        i.[type] in (5)        — Column Store
AND        i.[object_id] > 100
GROUP BY i.[object_id]
        ,    i.name
        , i.index_id
        , i.type_desc
        , nci.NumIndexes
        , nci.GBytes
        , p.data_compression_desc
ORDER BY CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1073741824.0 AS NUMERIC(12, 4)) + nci.GBytes DESC

0 Comments

Identifying Netezza Backupsets

This is more a note to self than anything…

Each week we run FULL backups against our databases, and write the backup files to disk.  In the past we kept the most current backup on disk for a week, and deleted it after the backups have been saved to tape.

Then we start the cycle all over again, and write the latest backup to disk.  At most we never had more than one backup retained on disk.

To add to our vulnerability we deleted the previous week’s backup on Friday morning.  It takes about 6 hours to do the deletes… On Friday at midnight we start the next full backups. 

But we are particularly vulnerable between the DELETE and the next FULL backup.  There’s about an 18 hour period where we don’t have any backups on disk.

To mitigate that, we are going to keep 2 weeks of backups on disk.  But before we write the 3rd backup we need to delete the first (oldest) backup.

The following command will identify the 2nd oldest backupset.  Once we have the backup set, we have all the info we need to delete the directory structure of the 2nd oldest backup before running the next FULL backup.

nzbackup -history -db myDBName | grep FULL | tail -2 | head -1 | awk ‘{print $2}’

There’s nothing complicated going on here, though it looks like a lot.  To gain a better appreciation for this, just run the commands left to right to see the impact of the succeeding command in the pipeline.

Stay curious; keep learning.

0 Comments

Bug in sys.fn_hadr_backup_is_preferred_replica

While testing backups in SQL Server 2014 Availability Groups I hit a bug in a system function that is used to determine which replica is the most appropriate to be backed up.

You can set the desired backup pattern by RIGHT clicking on the availability group to bring up the properties:

 

image

 

I’ve decided I want my backups to happen on my PRIMARY replica, leaving my READ ONLY reporting replica available for client reporting.

Then I setup a backup maintenance plan using the out-of-the-box Maintenance Plan builder to setup the FULL, DIFF and LOG backups.

The script used in the maintenance plan looks something like this:

[sourcecode language='sql'  padlinenumbers='true']
DECLARE @preferredReplica int

SET @preferredReplica = 
(SELECT [master].sys.fn_hadr_backup_is_preferred_replica('myDB'))

IF (@preferredReplica = 1)
BEGIN
    BACKUP LOG [myDB] TO  
	DISK = ... &lt;The rest of the backup command&gt;
END
[/sourcecode]

The bug was the function sys.fn_hadr_backup_is_preferred_replica always returned zero.  This resulted in SQL Agent jobs that complete successfully, but the backup never ran.

In this particular case the server was created from a VM template. The servername did not match the instance name, which caused the function to always return zero.

The fix… Rename the @@servername value for all instances in the AG.  Something like this did the trick:

[sourcecode language='sql' ]
SELECT @@servername; -- Returned CHANGETHIS
sp_dropserver 'CHANGETHIS';
sp_addserver 'HSVSQLAG02D', local;
[/sourcecode]

A quick restart of the instances and voila, the log backups actually worked.

Back to testing AG backup strategies.

Stay curious; keep learning…

0 Comments

View to get DB Size

The DBCC SQLPERF(logspace) is a handy routine to check the transaction log sizes.  But it does have some limitations:

  • The log file size is in KB.  All of our databases are in GB, so I end up unconsciously converting the KB output into GB.
  • Saving the data to a table is not convenient.
  • It provides the log file size, and the percent used, but the actual amount of data in the log is left as an exercise for the reader.

My first attempt at producing an alternative view of SQLPERF(logspace) just included the log file information from the SQL performance counters. 

My colleague Max MA (t) suggested adding counter values for the database data file size.  He also suggested replacing a calculated value for percent of log used with a counter value. 

I agreed with adding the data file size counter; I didn’t like the log percent used counter value because it is represented as an integer value with a rounded value.

Here is the query, that happens to make a convenient view:

[sourcecode language='sql'  padlinenumbers='true']
CREATE VIEW dbo.v_GetDBSize AS
	SELECT	DatabaseName
			, [Data File(s) Size (KB)] AS [Data File(s) Size (GB)]
			, [Log File(s) Size (KB)] AS [Log File(s) Size (GB)] 
			, [Log File(s) Used Size (KB)] AS [Log File(s) Used Size (GB)]
			, CAST(CASE WHEN [Log File(s) Size (KB)] = 0.00 THEN 0.00
					ELSE [Log File(s) Used Size (KB)] / [Log File(s) Size (KB)] * 100.0 
				END AS NUMERIC(18, 2)) AS [Log % Used]
	FROM	(	SELECT  instance_name as DatabaseName
						, ltrim(rtrim(counter_name)) as counter_name
						, CAST(cntr_value / 1024. / 1024. AS NUMERIC(18, 2)) AS [Storage(GB)]
				FROM	sys.dm_os_performance_counters
				WHERE	counter_name IN ('Data File(s) Size (KB)'
										, 'Log File(s) Size (KB)'
										, 'Log File(s) Used Size (KB)') ) as SourceTable
	PIVOT	(	SUM ([Storage(GB)])
				FOR	counter_name IN ([Data File(s) Size (KB)]
									 , [Log File(s) Size (KB)]
									 , [Log File(s) Used Size (KB)] ) ) AS PivotTable;
[/sourcecode]

Here is the sample output:

image

Since the code always runs off the blog page, here is a readable version of it:

image

This is simple stuff but makes checking log files and data file growth convenient.  This could easily be used as part of a regularly collected baselines routine.

Stay curious… Keep learning…

0 Comments

Netezza: Restore a table from an INCREMENTAL Backup

Netezza supports 2 types of backups:

  1. FULL
  2. INCREMENTAL
    • CUMULATIVE – All changes since the last FULL backup
    • DIFFERENTIAL – All changes since the last FULL or CUMMULATIVE

The first step to restore a single table is to see what backups we have.  Note: That NZ calls the FULL backup Increment #1.

Determine what incremental backups we have.  In this case we have a FULL + 1 DIFF backup.

[sourcecode language='sql'  padlinenumbers='true']
$ nzrestore -connector filesystem\
 -dir /backup1 /backup2 /backup3 \
 -sourcedb mySouceDB -incrementlist

Database      Backupset      Seq # OpType NPS Host
------------- -------------- ----- ------ ----------
mySouceDB     20150502072114 1     FULL   MYnetezza
mySouceDB     20150502072114 2     DIFF   MYnetezza
[/sourcecode]

Restore a single table (to a different database) up to the latest backup:

[sourcecode language='sql' ]
$ nzrestore -db MY_TEST -connector filesystem \
-dir /backup1 /backup2 /backup3 \
-sourcedb MySouceDB -tables MyTable

Restore of increment 1 from backupset 20150502072114 to database 'MY_TEST' committed.

Restore of increment 2 from backupset 20150502072114 to database 'MY_TEST' committed.
[/sourcecode]

 

If you want to limit the restore to the FULL (or an intermediate DIFF) add the -INCREMENT argument:

[sourcecode language='sql' ]
$ nzrestore -db MY_TEST -connector filesystem \
 -dir /backup1 /backup2 /backup3 \
-sourcedb MySourceDB \
-tables MyTable -increment 1

Restore of increment 1 from backupset 20150502072114 to database 'MY_TEST' committed.
[/sourcecode]

 

Some general comments:

  • The above backups are for backup files saved on a disk (-connector)
  • The backups are spread across 3 drives (-dir)
  • We are restoring to a different database (-db <> –sourceDB)
  • Multiple tables can be restored by adding their names to the –tables argument, space delimited.  Table names are case sensitive.
  • If the table does not exist, it will be created.
  • If the table does exist, use the –droptables argument for for the table to drop before the restore starts.

I’m sure I will reference this the next time I need to restore a table…

Stay curious; keep learning…

0 Comments

Filtered Indexes and CONCAT_NULL_YIELDS_NULL

Another lesson learned at the school of hard knocks…

I recently deployed a filtered index to a table that had a FLAG column.  I showed without a doubt in the test environment that this filtered index would be used by production queries at a much lower cost than without the index.

What I didn’t know was that the developers of the ETL decided to turn off CONCAT_NULL_YIELDS_NULL.

The net result was the first night the ETL ran we got an error message (from the MERGE statement) indicating that CONCAT_NULL_YIELDS_NULL was turned off and was impacting an indexed view.

It is well known, at least by myself, that indexed views have numerous restrictions, and for good reason, including several SET options.

Unfortunately, we didn’t have any indexed views on the database.

After much fussing around, we determined that the root cause of this blockage was the filtered index.

A review of the Filtered Index Guidelines for SQL Server 2008R2 makes no mention of the CONCAT_NULL_YIELDS_NULL restrition.

It is mentioned on the CREATE INDEX MSDN page.

The error that was captured from our ETL logs looked like:

1934 AS ErrorNumber, 16 AS ErrorSeverity, 1 AS ErrorState,  AS ErrorProcedure, 1 AS ErrorLine, MERGE failed because the following SET options have incorrect settings: ‘CONCAT_NULL_YIELDS_NULL’. Verify that SET options are correct for use with indexed view

This was misleading since we did not have any indexed views in the database.

I ran the following code on a local SQL 2014 instance to test the “theory” for the filtered index:

[sourcecode language='sql'  padlinenumbers='true']
CREATE TABLE myTable (	ID	int	identity(1, 1) not null
						, myCharData	char(3) null
						, myFlag		char(1) null );

insert into myTable 
	values (	'abc', 'Y')
			, ('def', 'N')
			, ('ghi', NULL);

create nonclustered index idx_myfilteredindex 
	on	myTable (myFlag)
		where myFlag = 'Y';

-- set compatibility to sql 2008R2;
alter database mySandbox
	set compatibility_level = 100;

set CONCAT_NULL_YIELDS_NULL OFF;

update	myTable
set		myCharData = 'xyz'
where	ID = 2;
[/sourcecode]

This proves that the filtered index does not even need to be part of the query to manifest the error.

The error that was presented also included a reference to filtered indexes:

UPDATE failed because the following SET options have incorrect settings: ‘CONCAT_NULL_YIELDS_NULL’. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or filtered indexes and/or query notifications and/or XML data type methods and/or spatial index operations.

Since CONCAT_NULL_YIELDS_NULL is advised to not be changed and will be removed in future releases, it is in our own best interest to not change this setting.

Rolling this out of the existing ETL is probably not a short term action item, but we better not be using it on any new releases.

The other take away from this… even the most simple changes needs to be tested by an end-to-end ETL run.  It’s better to discover this in test than production, any day of the week.

Stay curious… Keep learning…

0 Comments

Change Default Backup Location on SQL Server

This is just a quick post of a task that seldom needs to happen… Changing the default location of the backup directory.

There is no way to change this setting in SSMS…

The quickest way is to use an extended stored procedure.

[sourcecode language='sql'  padlinenumbers='true']
DECLARE @BackupDirectory VARCHAR(100) 
EXEC master..xp_regread @rootkey='HKEY_LOCAL_MACHINE', 
  @key='SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQLServer', 
  @value_name='BackupDirectory', 
  @BackupDirectory=@BackupDirectory OUTPUT 
SELECT @BackupDirectory

EXEC master..xp_regwrite 
     @rootkey='HKEY_LOCAL_MACHINE', 
     @key='SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQLServer', 
     @value_name='BackupDirectory', 
     @type='REG_SZ', 
     @value='\\hsdd1\Litespeed\UAT\HSPSQLHFC01T'
     
DECLARE @BackupDirectory VARCHAR(100) 
EXEC master..xp_regread @rootkey='HKEY_LOCAL_MACHINE', 
  @key='SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQLServer', 
  @value_name='BackupDirectory', 
  @BackupDirectory=@BackupDirectory OUTPUT 
SELECT @BackupDirectory
[/sourcecode]

You may need to run regedit and “find” the BackupDirectory key location on your instance.

For a much more comprehensive post on this topic, please refer to Greg Robidoux’s post.

Stay curious, keep learning…

0 Comments

LiteSpeed Restore from Remote Share v8.0

In versions of LiteSpeed for SQL Server prior to 8.0 I’ve always been able to restore backups from remote shares by cutting/pasting the remote share path (\\serverName\driveLetter$\backups).  In fact, the remote shares are not always actual shares.  If they were LiteSpeed would pick them up in the “browser”.  Often times they are just unshared paths but I’m able to access them using administrator rights.

After upgrading LiteSpeed to 8.0 I discovered that my remote paths will no longer be recognized as a valid path when I manually enter the paths.  The obvious work around is to copy the backup file(s) from the remote server to a local drive.  For small database backups this may be fine.  But for backup files that are hundreds of megabytes in size, the copy alone can take hours.

For DEV and UAT (user acceptance test, aka TEST) machines most of the restores will be coming from a production server.  Basically, once in a while our developers will want a refresh of the existing DEV/UAT databases from the production server.  This is where the issue of restoring from remote shares is the most common.

Instead of copying huge backup files from the remote servers I found it was much easier to change the default path of the local servers (DEV/UAT) backup path to point to the remote server’s backup directories.

Unfortunately, there is no interface in SSMS that will allow us to change the default backup path; it is a registry setting.  We have two ways to change the registry setting:

  1. RegEdit
  2. Extended procedure xp_regwrite

This post will describe how to use the extended stored procedures to change the default backup directory, and then use LiteSpeed to restore a DEV/UAT database from a remote production server.

I did find it necessary to use regedit  to determine the path of the BackupDirectory key.  It seems to be a little different for each instance, depending on the instance name.  The path will be something close to the following:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQLServer

RegEdit

Copy the full path by right clicking on MSSQLServer ==> Copy Key Name.  You’ll need it for the call to the extended procedure to change the path value.

Paste the key from regedit into the following xp command to determine the current default backup directory:

DECLARE    @BackupDirectory VARCHAR(255)
EXEC master..xp_regread @rootkey=’HKEY_LOCAL_MACHINE’,
    @key=’SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQLServer’,
    @value_name=’BackupDirectory’,
    @BackupDirectory=@BackupDirectory OUTPUT
SELECT @BackupDirectory

This will return the current default backup path.  To change the default backup path to the remote server’s backup directory:

EXEC master..xp_regwrite
     @rootkey=’HKEY_LOCAL_MACHINE’,
     @key=’SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQLServer’,
     @value_name=’BackupDirectory’,
     @type=’REG_SZ’,
     @value=’\\myRemoteServer\j$\Backups';

Now when you restore a database in LiteSpeed 8.0 the “Backup Directories” will expand to the directory structure on the remote server.  Just click through the directories as needed and select your backup files (FULL + DIFF for each stripe) and restore as usual.

LiteSpeedRestore

Note: This is only necessary if the backup directory is not explicitly shared.  Shared drives can be found by selecting the Browse Network.  It may take a little more effort to find the specific server, but the share will be presented.

Thanks to Greg Robidoux for his post Changing the default SQL Server backup folder.

If you do decide to make this change be sure to write LOCAL backups to the local backup drive.  By now your backups should already be configured in LiteSpeed, but this may be a point of confusion if a new person is introduced to your backup process.

Stay curious; keep learning…

0 Comments

Color code SSMS status bar

I work with all kinds of SQL Server instances.  All of them are one of three possible types:

  • Development
  • Test
  • Production

Many of my scripts have a quick query at the top to get the server name (@@servername).  I use this to double check the server instance I’m on.  More specifically, I use this to make sure I’m NOT on a production server (when not deploying production changes).

Andy Menon, a co-worker and all around genius, showed me how to change the color of the status bar depending on what server SSMS is connected to.  He suggested changing the color based on the type of the server connection, such as:

  • Development – GREEN (who cares, hack away, you’ve got the green light)
  • Test – YELLOW (caution, you may end up jacking up someone’s code)
  • Production – RED (STOP… do you know you are on a prod server?)

By making a simple change in the connection box for all of your servers your status bars can look like this:

Development Server:

image

Test Server:

image

Production Server:

image

There’s no new science here, but if it can spare me just one time from unintentionally executing code on a production machine, it is well worth the time to set this up.  Plus, I always know where I am!  What a concept…

Here’s a quick guide on how to do it:

  • In SSMS open a new database connection in Object Explorer.
  • Click on “Options”:

image

  • Click on Connection Properties

image

  • 4. Click Use custom color.
  • 5. Pick your color.
  • 6. OK… Connect.

Here’s the official Microsoft link on how to do this in Object Explorer (and Central Management Server).

http://msdn.microsoft.com/en-us/library/hh213617.aspx

Stay curious, keep learning…

0 Comments

Using EVENT NOTIFCATION to ALERT on user query CPU usage

We often have customers running ad hoc queries on the databases that we host.  Most of the time the queries run fine… sometimes they don’t.

The first step we took towards managing rogue queries was to deploy Resource Governor (only available in Enterprise or Developer editions) to limit their MAXDOP.  In general, a rule of thumb is to limit ALL queries to the number of physical cores on a single socket.  (The basis for this rule is related to NUMA, and SQL Server’s “NUMA awareness”, which I will not go into here.  If you really want the authority on these matters check out Glenn Berry’s work.)  By limiting the MAX DOP we can at least take comfort in knowing that all of the CPUs will not be consumed by a bad query.

The next step was to create an alert that fires when the user query (not system, an individual query) reaches a specified CPU threshold, in seconds.

It turns out there is a simple way to do this using the standard alert configurations in SSMS.  Unfortunately, the counter behind the alert never changes from zero (at least not in SQL 2008R2SP2CU3).  So I scrapped that approach.

The next approach involved SERVICE BROKER, EVENT NOTIFICATIONS and in this case, RESOURCE GOVERNOR.  That sounds like a lot of moving parts to create an alert.  Fortunately, I found an article by Aaron Bertrand, who leveraged work by Jonathan Kehayias, that provided me with a good foundation to develop my own, extensible alerting framework.

The following is an excerpt from an email that was sent to my internal team.  I had to tweak it a little to make it more generic, but the concepts remain the same.

I’ve added the TSQL in the New Downloads section (to the right of this screen) called Event Notifications Framework.  You can use that code to build your own set of trace event alerts.  I only request that you not remove any of the credits, myself as well as the link to Aaron’s post.

—————————————————————————————————

OBJECTIVE: Create a database alert that will fire an email to support whenever an external user query exceeds a CPU threshold, such as 1 minute or 1 hour.

Q: I thought we already had a “longest running transaction” alert.  What’s the difference?

A:  We do.  The LONGEST RUNNING TRANSACTION alert fires when a TRANSACTION (INSERT, UPDATE, DELETE, MERGE) runs for longer than a fixed threshold.  For HFC that threshold is 8 hours.  There are intrinsic weaknesses in this alert:

  • It does NOT apply to SELECT queries, which are usually not part of a TRANSACTION.
  • If multiple transactions are running above the 8 hour threshold, the alert will only fire for the first transaction that crosses the threshold.  If the second transaction finishes before the first transaction, it will never be recognized.
  • Simply put, this counter does nothing to warn us of runaway, ad hoc SELECT queries.

Q: Why didn’t we create a simple “long running SELECT query alert” in the first place?

A:  It should be possible to create a high CPU alert using a standard SQL Server alert configuration.  The configuration screen below “should” work… but it doesn’t.  It turns out that the counter is always zero.  It is a bug that was identified in the original release of SQL Server 2008 and has not been corrected in any of the subsequent service packs (SP) or cumulative updates (CU).  Don’t worry… I tried it anyway.

clip_image002

Q: How does this “alternative” approach work?

A:  First, a couple of SQL Server subsystems need to be configured.

  • Deploy Resource Governor (RG).  RG makes it real easy to:
  • Categorize user logins into certain “buckets” using a “classifier function”.  We then use these buckets to tell the SQL Engine how much resources the users in those groups can consume.  (In most cases, only when there is contention on the system.)
  • “Limit” any user’s query to a specified number of CPU seconds. We can do it when we configure RG by setting the REQUEST_MAX_CPU_TIME_SEC.  (In SQL Server, a “request” is just a confusing term for a query of any sort.)

clip_image004

  • Next we need to configure SERVICE BROKER (another subsystem of the SQL Server engine) to send out the alert.  SERVICE BROKER is a fancy (misleading) name for an internal, queue-based, asynchronous messaging system.  Specifically to this alert, when the CPU time for a user query goes above the 5 second threshold, an XML “message” is placed on a special SB queue.  Once it is on the queue, it will stay there until some other process comes along and “consumes” it.
  • EVENT NOTIFICATIONS, is what makes it possible for the trace event that was set by resource governor to be placed on the SB queue.  There is nothing to configure… it just runs.  Event notifications was introduced in SQL Server 2005 so it’s hardly new technology.
  • A special stored procedure “service” was written that acts as the consumer of the XML events that are placed on the SB queue.  Ultimately, it sends out the email alert with the details of the event.

Q: Why do we need SERVICE BROKER/EVENT NOTIFICATIONS to process the event?  Shouldn’t Resource Governor have some way to flag an administrator if the CPU threshold is exceeded?

A: For starters, we really would not want RG to be automatically killing queries that exceed a certain CPU threshold.  The ONLY response RG does when the CPU threshold is exceeded is raise a “trace event”.  Once the trace event is raised, EVENT NOTIFICATIONS places the event on the SB queue.  What we DO with that information is entirely up to US, the watchdogs of our systems.  There are other “limits” that RG does place a hard stop on, such as the MAX DOP.  But in the case of the CPU threshold, it doesn’t make sense.  The response to the event simply needs to be customized.

Q: What will the email alert look like?

A: The email message will prominently include:

  • The originating server
  • The CPU time in milliseconds that the query has been running before the ALERT was processed from SB
  • The SPID of the offending query session.

SERVICE BROKER and RESOURCE GOVERNOR are only available in the ENTERPRISE edition of SQL Server.  They are also at the top of the list of subsystems that are too often not used (aka, least understood).  But their potential value is immense.

clip_image006

Q: If the alert fires, how can we determine which user queries were running at the time of the event?  We may not see the alert until several hours after the event occurs.

A:  One of the “responses” that has been configured is to call sp_whoisactive and save the results into a table in DBA_BASELINES.  This will capture all requests (active queries) that are running on the system.  From there we should be able to narrow down the offending query/user very quickly. (Amplification: We have a SQL Agent job that calls Adam Machanic’s sp_whoisactive, and writes the result set into a table.)

Q: Can the CPU threshold fire even if the elapsed time (wall clock) of the query is less than the threshold?

A: Based on my knowledge of query analysis, YES.  A CPU bound query, such as one that is dominated by a large SORT operation, that is running on multiple CPUs, could easily use up the 5 seconds of CPU in less than 5 seconds of ELAPSED time.  In an extreme case, 5 seconds could be used by 4 CPUs running in parallel in 1.25 seconds.

Q: In the sample email message the reported CPU time is 8.376 seconds.  Why didn’t the event fire on 5 seconds, which is the preset threshold value in RESOURCE GOVERNOR?

A: Service Broker is an ASYNCHRONOUS queue manager.  This means that it will process the incoming event when it can; which may not be immediately.  The delay could be UP TO 5 seconds after the trace event fires.  In the real world we will be concerned with queries that are running for at least 15 minutes.  Another 5 seconds (max) added to the runtime is insignificant.

Q: If the user queries runs for hours and hours through the night will my inbox be inundated with this alert?

A: No.  This alert will fire ONCE and ONLY ONCE for each query that exceeds the CPU threshold.

Q: Will this approach create a high overhead on these busy database servers?

A: Two out of the three components of this system are always running anyways… SERVICE BROKER and EVENT NOTIFICATIONS.

SQL Server uses SB for its own management of processes.  All instances of SQL Server have SB running in the background.  It’s engineered into the SQL Engine.  EVENT NOTIFICATIONS is very lightweight and consumes minimal system resources.  It’s running all the time for SQL Trace, Profiler and the DEFAULT trace.

The only “optional” component of this solution, RESOURCE GOVERNOR, which has already been proven to be a valuable tool with managing system resources, is already in place.  By all accounts, this approach to converting trace events into alerts has a very low overhead on the database engine. (Note: Resource Gov is NOT needed to process trace events, only the trace event that is used by RG to limit REQUEST_MAX_CPU_TIME_SEC.)

Q: If we now have a FRAMEWORK to process alerts, what other “events” can be added to our pool of interesting alerts?

A: Anything that can be monitored in SQL Profiler (or SQL TRACE) can be monitored and alerted on using EVENT NOTIFICATIONS.  For the CPU Usage alert we only needed RESOURCE GOVERNOR to set the max threshold.  Otherwise, it isn’t really needed for this framework to function.  [I believe DDL triggers can also be processed with this framework but I have not tested any.] Adding any event that is available from sys.trace_events requires two simple steps:

1. Create a new notification.  Such as…

clip_image008

2. Add a block of code to the Service Broker “Service” dbo.dba_processEventNotifications.  This custom “service”:

  • Pops the trace events off of the SB queue.
  • Reads the XML out of the SB message.
  • Formats the email SUBJECT and BODY content and sends out the DB Mail message that contains the alert information.

Basically, the following block of code would need to be added.  The @message section would need to be tweaked to pull out the relevant attributes from the XML message.  For example, CPU would not be relevant to an alert triggered by a database file autogrowth, but the database file name would be of interest.

clip_image010

Once again, the source code for this is available in the NEW DOWNLOADS section.

Stay curious, keep learning…

0 Comments

Return First and Last Day of Previous Month

I had a colleague approach me today with a production issue.  The base challenge was to produce a query that will return the first and last day of the previous month.  When I reviewed the query that he provided (ie., the one that was in the production code) I quickly determined that it was too complicated for the amount of work it needed to do.  Worst yet, the results it produced were wrong, some of the time.

So I set out to re-write it “my way”. 

If you don’t want to read the details of how this works let me just give you the query that I created.  The following code snippet will return the first and last day of the previous month, for each day in 2014.

[sourcecode language='sql'  padlinenumbers='true' wraplines='true' htmlscript='false']
use tempdb;
go

CREATE TABLE #mydates (mydatetime	datetime
						, startPrevMonth	datetime
						, endPrevMonth		datetime
						, startdateid		int
						, enddateid			int )

truncate table #mydates;

declare	@myDatetime	datetime;
set		@mydatetime = '1/1/2014';

WHILE @mydatetime &lt; '1/1/2015'
BEGIN

	INSERT INTO #mydates
		SELECT	@mydatetime AS myTime
				, d.Start_of_Prev_Month
				, d.End_of_Prev_Month
				, d1.date_id AS Start_of_Prev_Month_ID
				, d2.date_id AS End_of_Prev_Month_ID
		FROM		(	SELECT	DATEADD(day
													, (DATEPART(day, (DATEADD(month, -1, @mydatetime) )) * -1) + 1
													, DATEADD(month, -1, (cast(cast(@myDatetime AS DATE) AS datetime))) 
						
									 ) AS Start_Of_Prev_Month
							, DATEADD(ms, -2, (DATEADD(month, 1, (DATEADD(day
													, (DATEPART(day, (DATEADD(month, -1, @mydatetime) )) * -1) + 1
													, DATEADD(month, -1, (cast(cast(@myDatetime AS DATE) AS datetime))) 
						
									 ))))) AS End_Of_Prev_Month ) d
				INNER JOIN GMCR_Mart.dbo.d_date d1
					ON	d1.the_date = d.Start_of_Prev_Month
				INNER JOIN GMCR_Mart.dbo.d_date d2
					ON	d2.the_date = CAST(d.End_Of_Prev_Month AS DATE);

	SET @myDatetime = DATEADD(day, 1, @myDatetime)

END

select * from #mydates
[/sourcecode]

There were some interesting findings while developing this simple query.

  1. Incorrect results were returned with the initial cut at this query when the month we were looking at had more days then the previous month.  For example, March has 31 days, but February has either 28 or 29 days.  July has 31 days but June has 30.  The same holds true for Oct/Sept and Dec/Nov.  While January has 31 days, so does December, so there were no issues encountered.
  2. The SQL DATEADD function is pretty intelligent.  If we are subtracting a month from March 29, it will return February 28 (for non-leap years).  It will also return February 28th for March 30 and 31.  This is a definite plus… somewhat obvious.  But if it didn’t work as it does we would have other problems.  The following queries all return Feb 28, 2014:
    [sourcecode language='sql' ]
    select dateadd(month, -1, '3/28/2014')
    select dateadd(month, -1, '3/29/2014')
    select dateadd(month, -1, '3/30/2014')
    select dateadd(month, -1, '3/31/2014')
    [/sourcecode]
    
  3. One of the tasks we needed to do was to strip out the time component of a datetime data type (I know, deprecation city.  Developer habits are tough to change.).  Itzek Ben-Gan’s T-SQL Fundamentals (SQL 2008) has a nifty query that subtracts the number of days since a reference date.  Here’s an example of his code.  Although it is simple enough, it’s not obvious what it does if you are looking at it quickly.
[sourcecode language='sql' ]



select dateadd (day, datediff(day, '20010101', '3/29/2014 10:24:54.123'), '20010101')


[/sourcecode]

A more intuitive approach is to cast the date/time to a DATE datatype, then back to the datetimeX datatype.  Of course, this approach will not work for SQL2005 and older.

[sourcecode language='sql' ]



select	cast(cast('3/29/2014 10:24:54.123' AS DATE) AS datetime)


[/sourcecode]

Now, for the interesting part.  What I discovered from my initial query (aka, not correct) was that when we had a 31 day month that was preceded by a 28 or 30 day month, the previous month’s begin and end dates returned were incorrect.

Here is what the original query looked like (returns just the previous month’s first day… the last day had the same problem):

[sourcecode language='sql' ]
-- Wrong way!!!
select DATEADD(	day
				, (DATEPART(day, '3/29/2014' ) * -1) + 1 --********** WRONG!!! Don't subtract 29 days from a 28 day month! *******
				, DATEADD(month, -1, (cast(cast('3/29/2014 10:24:54.123' AS DATE) AS datetime))) )
[/sourcecode]

This query will return “2014-01-31 00:00:00.000”.  Obviously this is not the first day of the previous month, relative to March 29th.  When subtracting 29 days from a date in February, you are suddenly in January.  Go figure…

Hopefully the code snippets in this post come out ok.  I’m a little rusty at posting code; and thhis is a new word-press plug-in. 

Thanks to Rohit Shinde for bringing this fun puzzle to my attention.

Stay curious; keep learning

6 Comments

Size Matters: Keep Model Small

I recently discovered an undesirable behavior of SQL Server on a brand new SQL 2008R2SP2 cluster.  If the cluster failed over once it was fine.  If it was immediately failed back to the original node, it would hang when taking SQL Agent offline.

The cluster had two physical nodes (01P and 02P) and three (3) SQL Server instances (AP, BP & CP).  It turns out that neither the hardware or the OS were contributing factors, so there is no point in describing their configurations.

The instance that was presenting the most problems was for a new client.  In fact, when the issue was first discovered when the cluster was not even in PRODUCTION yet.  This afforded us the luxury, at least early on, of being able to fail the instance over, almost on an as needed basis.  Eventually the cluster did go into production so running tests and collecting logs involved a more rigorous process.

The databases on the instance were fairly small, by our standards.  On the order of 100GB.  The databases are not transactional.  For the most part they are load-and-report databases, the classic OLAP.

I worked with Microsoft support over a four month period.  The SQL Server support engineer, Pradeep M. M., and I had sort of a doctor-patient relationship.  He would prescribe the test, I would submit the test results to him.

Most of the tests were in the form of setting trace flags to enhance logging to the SQL error log, failing the cluster over, and providing the logs for analysis.

Over the course of the four months we did upgrade from cumulative update 3 (CU3) to CU8, and eventually CU12.

While nothing in the logs pointed directly to the root cause, it was discovered fairly early on that the startup of tempdb was taking upwards of 3 minutes to come online.  We checked to insure Instant File Initialization was correctly setup (which only applies to the data files, not the log file).  With two 40GB tempdb data files at initialization, plus an 8GB log file, we are dealing with a small (by our standards) tempdb.

This would explain why an instance would fail when taking SQL Agent offline.  Agent couldn’t go offline until the instance is fully online.  Without a tempdb, the instance is still coming up.

We checked other instances that we manage and discovered that most instances of SQL Server were able to initialize tempdb in 5 seconds or less, at service startup, even for instances where tempdb was up to 800GB.

Of course, any time we start seeing IO related issues, we always blame the SAN, right?  Nothing was discovered on the SAN side.

What about the hardware?  Could drivers be out of date?  Sure, but we have 3 instances running on 2 nodes.  Both nodes are brand new blade servers.  While the AP and CP instances work fine, regardless of the physical node they were on.  While the BP instance consistently hang, regardless of the physical node it is on.

At that point it is fair to conclude that we are not dealing with an external resource constraint, either at the SAN or physical hardware level.  Most likely, the root cause is at the instance level.

The root cause of the problem is that the MODEL database was configured to have a 1GB initial file.  The short version of the explanation is that, even though tempdb is configured with it’s own attributes (number of data files, log file, autogrowth, growth increments, etc.) at service startup it is initially created by MODEL.

Rather than re-invent the wheel, I’ll provide Pradeep’s detailed explanation:

When SQL Server starts it re-creates tempdb every time; If it has to automatically create the database it has to first set some default properties and to achieve this it uses model database as a template and then once the initial template for the tempdb is ready it goes ahead and reads the actual configuration that has been kept from the master database for the tempdb files and then tweaks the same. Now when it copies the model database as a template and carves out tempdb from it doesn’t mean that it uses a windows API call to copy the file and paste the file and then move ahead. It cannot use this because SQL server internally maintains file header ( System Pages like PFS , GAM ,SGAM ) and other integral structures and it has to modify the file header and change other internal pages to remove model reference and add tempdb references to it.  Due to this it actually reads all the allocate and un-allocated extents from model in 64kb chunks and then copies them to the tempdb database. When compared against normal file copy operation this operation will be pretty slow. But in the current product this is how it has been designed. The bigger the model database the number of extents available will increase and it will take more time to copy the extents to the new database in 64kb chunks.

The above was the design which exists till SQL Server 2008 R2; In SQL Server 2012 some significant design changes have been made which avoids the problem completely and fastens the data copy of model database extents to tempdb as well as in SQL Server agent there is some code changes as well which avoids the waiting situation. At this point of time the same architecture changes cannot be ported back to SQL Server 2008 R2 as the product has already reached Extend Support Phase where in only security fixes will be made and any other new issues/design change requests will be denied.

Be mindful that this is only relevant through versions SQL Server 2008R2.  As Predeep has indicated above, it is not an issue in SQL Server 2012.

If find it curious that this issue has not been raised in the 6 years that this product has been in the field.  At this point in time, there may not ever be a KB article on this, since the product is past the maintenance support end date (July 8, 2014).

After resizing model to the default 2mb data and 1mb log, the delays in instance failovers disappeared.  That 80GB tempdb database came on line in less than 2 seconds.

How do you know if your server has the issue?

  1. Are you running SQL Server 2008R2 or earlier? And…
  2. Has your model database been changed to have a data file larger than 2MB? And…
  3. Run:
    1. Capture the @startTime = current_timestamp
    2. Create DATABASE myTest (ie., a plain copy of the model database)
    3. Capture the @endTime = current_timestamp
  4. The delta between @startTime and @endTime is more than a second, such as 3 minutes, (ie., the time to create that database) you probably have an issue with model.

Lesson Learned: Contrary to popular beliefs, for SQL Server 2008R2 (and earlier), leave the model data file size at the default 2MB.  Increasing the size of model will create delays in the creation of tempdb at service startup.

Stay curious, keep learning…

0 Comments

Mounting a TV in Fireplace

About a month ago I completed a home project that involved mounting my 40 inch flat screen TV “in” my existing gas log fireplace.  In order to convince my skeptical wife that the project would look nice I had to have a rollback plan that included restoring the fireplace to it’s original working condition.  That meant I could not damage the fireplace in any way, including drilling holes in the back of the firebox or any surface that could be seen from the family room.

It took me a while to figure out how to do this but with a little ingenuity and online research I ended up finding the solution.  Simply put, I used plywood and magnets to cover and secure the fireplace.  It worked perfectly… well, at lease my wife is happy with it.

Too often we see TVs mounted OVER the fireplace.  My family room is a decent size at 16’x20’, but I still felt a TV mounted over the fireplace would eventually put strain on my aging cervical spine.  Mounting the TV in the fireplace had the potential of having the opposite result: too low to the floor.  I was able to overcome this by choosing a mounting bracket that allowed the TV to be mounted with an upward tilt.  Up to this point neither of us has ever uttered a comment that the TV is too low to the ground.

The following picture shows my initial installation of the 3/8” plywood.  The piece I chose was “rough” on one side and smooth on the other.  You’ll notice that either the wood is not perfectly square or the firebox isn’t so I ended up using a couple of “feet” under the board to raise the board just enough so there is an optimal gap along the top and sides.

IMG_1186

Initially I thought I was going to want to have four handles on the outside surface.  My thinking was that even with the TV mounted two people could remove the entire assembly away from the firebox by pulling on the bottom handles to break the magnets away from the metal firebox while using the top handles to stabilize the TV and prevent everything from crashing to the floor.  After a fair amount if experimentation I discovered that the top handles were not needed and the final design did not include them.

IMG_1187

The following image shows the final configuration of the back side of the mounting board.  It took a fair amount of experimentation to arrive at this configuration so let me break it down.

IMG_1194

  1. The backside of the board was painted with this tan color… it was just a color I had hanging around the garage that I didn’t think I was ever going to use for anything else.  I wanted both sides of the board painted for the following reasons:
    • The original plan was to leave the pilot light running.  This would allow the quickest turnaround in case we wanted to remove the TV so we restore the fireplace to a working condition.  I wanted to protect the wood from the long term effects of the heat of the pilot light on the back which may result in excessive drying of the wood, and potentially be a fire hazard.
    • I thought that a painted surface would be less likely to warp over time.  A warped board would have the potential to allow the magnets to separate from the steel frame of the gas log fireplace, resulting in a catastrophic failure of the system.  Ie., the TV would unexpectedly fall forward into the living room.
  2. To secure the board and TV to the steel firebox I used a total of 14 mounting magnets from K&J Magnets. Ten (10) of the magnets are mounted across the top portion of the board; four (4) are mounted across the lower half of the board.  Each of these little dynamos has a rated pull force of 26.75 pounds.   In practice, the pull force of these magnets was less than the lab tested “ideal” conditions (where the magnets are tested against thick steel plates and are aligned perfectly against the surface of the plate).  I didn’t measure the actual pull force required to break away the magnets by pulling on the front pull handles but I am certain the ten magnets across the top is over engineered.  The four magnets on the bottom are just there to ensure the board never kicks away from the wall by sliding on the two feet that were used to level the board.
  3. The two sections of aluminum angles that run along the extreme left and right sides serve two purposes:
    • To prevent the board from warping along the vertical axis and,
    • Provides a connection point for the vinyl coated steel cable that connects each of the aluminum pieces.  This cable will be connected to another piece of cable that is on the inside of the fireplace.  In the case of a catastrophic failure, this assembly will at a minimum slow down the fall of the TV and ideally, prevent the TV from ever hitting the floor.
  4. Since the original concept called for the pilot light to remain lighted, I secured a glass backing over a hole (located about 10 inches above the bottom, just to the left of the board center line).  This peep hole would allow us to see the pilot light easily without having to pull the board away from the firebox.  In the end, I decided to not keep the pilot light lit so this feature really is not needed.
  5. There are two additional holes in the board to allow power and HDMI cables to come into the board (lower right, just left of the aluminum brace) and the exit the board to connect to the back of the TV (upper right, just south of the fourth pair of magnets).  In order to maintain a tight seal around the wires but allow some variability in the size of the wires I used 2 inch thick, compressible styrophone material.  I was able to cut the material using a hole saw bit by turning the bit in the opposite direction.  This was done by hand, not with a drill.  I was amazed at how clean I was able to get those holes to come out.  In order to get the wires to pass through the foam, I cut a slit from the center to an outer edge.  It worked perfectly.
  6. The last “feature” visible from the back panel is a variety of types of insulation strips used to seal off the firebox.  This mounting board offers the added benefit of saving on cooling and heating by significantly minimizing the amount of air flow from the family room to the firebox and out the chimney.

The following picture shows the mounting board aligned along the bottom of the firebox just hanging by the safety cable.  From this angle it is easy to see the pilot light viewer, and the foam inserts used to seal the wiring.  You can also see that I covered the exposed surfaces of the mounting magnets with masking tape.  This is to prevent the magnets from scratching the painted steel surface of the firebox.  This was suggested by support at K&J.

IMG_1197

This is what the externally facing board looks like.  I finished the board with a black paint that is used to paint exterior doors.  It is very durable and very black.  The mounting bracket was purchased from Sewell Direct.  My main requirement for this bracket was that it allow an upward tilt.  Otherwise, it is a standard mount.  This image also shows that the top handles have also been removed, leaving only the bottom handles.

IMG_1198

Finally, the finished product.  As far as I can tell, the wife is happy… As am I.  My only concern at this point….  The TV is too small for the fireplace mantle.  That 40 incher could easily be upgraded to a 50 incher to at least cover the mounting board on the left and right sides.

IMG_1320

For the record, I didn’t do this alone.  I had several email conversations with K&J regarding the feasibility of using magnets for this project.  Not only was I concerned about the strength of the magnets, but also interference on the TV itself.  So far, I have not had any issues with either.

For those that are more interested in where all of the electronics is… I have a dedicated closet on the other side of the wall to the left of the fireplace.  The folks at Sewell Direct greatly assisted me with selecting the proper infrared (IR) product and flat/paintable HDMI cabling.

For the audio buffs, the speakers were purchased new in 1988.  They are Magnat Deltas.  Unfortunately, Magnat no longer sells in the USA.  After all of the years I have no desire to replace them.

0 Comments

10 Tips to Optimize Data Warehouse Reporting

Unless your marketing data warehouse is stored in a massively parallel appliance such as Netezza, your data most likely resides on a Symmetric Multi-Processing (SMP) computer in a relational database engine, such as Microsoft SQL Server.

SQL Server is a highly scalable, versatile database product that can serve the needs of a wide range of applications, from the tiniest 100MB database all the way up to multi-terabyte data warehouses.  Along with its versatility comes the responsibility of the data architects to optimize the configuration of the database for the specific application.  The configuration of a transactional database, such as an order processing system, will look much different than a heavy hitting operational data warehouse.

For large-scale data warehouses, such as marketing automation databases, optimization often means providing the end users with the fastest reporting times possible.  Slow response times will quickly frustrate them, and inevitably, result in your phone ringing off the desk. 

If you are running a large data warehouse, and don’t like having the red phone light up all day long, there are a few simple rules that must be followed in order to keep your customers happy:

  1. Spread your data across as many physical disk drives as possible and as evenly as possible.  If you are building a large data warehouse, don’t put all of your data in a single file group that contains a single file.  In order to do this you will need to:
    • Create multiple file groups.  File GROUPS are logical database containers that are used to hold database files.
    • Create multiple database files for each of the file GROUPS.  These are the physical database files.  Ideally they will be spread across multiple physical drives. 
    • Rule #1: A database RECORD is always contained in a single database file.  It is NEVER spread across multiple database files, even if those database files are in the same file GROUP.
    • Rule #2: A database TABLE is always contained in a single database file GROUP.
    • Rule #3: SQL Server uses a proportional fill algorithm to spread writes across the multiple database files (within a file group).  If all database files in the FILE GROUP are the same size the same number of records will be written to each individual file (desired affect!!!).  Hence, always create database files within the same file group to have the same size.
  2. Initialize your database files to the max size that will be needed to hold your data.  This is way easier said than done.  It’s very easy to grossly under estimate the amount of data that will be flowing into your data warehouse.  When this happens the data files will auto grow, which is a SLOW process.  Even more important in the context of reporting, the new disk space added to the file group will probably not be contiguous with the original file, which will lead to slower reporting capabilities.
  3. Use the fewest number of threads possible when loading data.  If you set the MAXDOP (max degree of parallelism) to 1 the incoming data will be written to contiguous chunks of disk space.  Data warehouse reporting is typically grabbing lots of related records, usually over a time range.  This results in pulling records that are all loaded at the same time.  This is known as a RANGE SCAN.  Range scans work best when the data is stored on contiguous blocks of disk.  It’s really that simple…  By using a single thread to lay the data down on the disk, we are sacrificing load performance for improved reporting performance.  That’s OK, you only load the data once and hopefully no people are involved in the load process.  The data will be read many times over by people waiting for reports to come out of the system.  A slower load time is a fair trade for a faster user experience any day of the week.
  4. Maintain the highest quality statistics on your data as possible.  If your tables tend to grow incrementally over time, the auto statistics feature of SQL Server will work fine, for a while.  At some point the low sample rate of the auto stats will result in inaccurate statistics.  For that reason it is absolutely imperative that a rock solid statistics maintenance plan be put in place.  And don’t be fooled into thinking that the stock “maintenance plan” wizard in SQL Server will suffice.  What happens if a statistic becomes severely out of date?  The query optimizer will generate a bad plan for the SQL query and the query will appear “hung”.  The consumer of the data will think the server is busy or that you have a bad design, when in fact you just have outdated statistics.
  5. Give SQL Server as much memory as you can afford.  Whatever you give it, it will use.  If your server is a dedicated database  server, turn on Lock pages in memory.  This will prevent SQL Server from ever giving up memory after it has allocated it.  If your server is used for other applications however, locking memory pages may end up starving those applications of memory, which may end up making the database look slow.
  6. On a related note, separate services when ever possible.  SQL Server Integration Services (SSIS), like the SQL Server engine, will grab as much memory as is available.  If you just gave all of the system memory to SQL Server (minus a few Gig so you don’t starve the OS) what will be left for your ETL/loads?  Put SSIS on a separate server.
  7. If you have tables that are loaded once a month and then used for reporting purposes the rest of the month, consider placing these tables in READ ONLY file groups.  The SQL engine will reward you with a drastic reduction in locks locks on your data, and your data consumers will think you are a genius.
  8. For very large tables, such as more than 100M records, consider a horizontal partitioning strategy.  SQL Server does not have a nice wizard to setup partitioning.  It is very technical to implement and deploy.  But if you have a table with 200M records, but usually don’t care about any of the data except for the most recent month, horizontal partitioning will allow correctly designed queries to ignore all but that most recent month’s data.  This is called partition elimination, and can/will make a huge difference in query performance.
  9. Do your housecleaning…  Get rid of indexes that are no longer needed.  Add new indexes sparingly and with great caution.  But do so when it makes sense.  I’ve never seen a perfectly indexed database in the initial design.  Drop “temp”/”test” tables.  Archive old data.  Better yet, deploy an automated archiving process.  There’s no point in keeping 10 years’ worth of data when your end users only need the current year plus last year.
  10. And last but not least, my personal favorite, NEVER SHRINK a database or database file.  If you do, you just fragmented your data and indexes, probably FOREVER.  The only way to get rid of that fragmentation will be to move the data to a new file group that isn’t fragmented.  The catch 22… if you just did a shrink on your database it’s probably because you were running low on disk space.  If you don’t have disk space to auto-grow your existing data files, you probably don’t have the disk space to move the data to a new file group.  The data will remain fragmented, FOREVER.

Your reporting customers want their data as simply and as quickly as possible.  Making that happen only requires following some basic design and maintenance strategies.

What strategies have you employed to avoid a data scare-house?

0 Comments

How to List Size of Tables

Even though disk space is cheap it continues to the be the first resource to run out on a server.  And once it runs out it’s a constant battle to keep the database size in check.

For sure there are things that can be done to keep database growth in control:

  • Archive.  I’ve seen databases where we have 6 years worth of data in it.  Yet our contract with our client only required 4 years.  That means we are storing 50 percent more data than we need to.  Why don’t we set up an automatic archival process?  Probably because we don’t have a framework that we can drop in place so we have to build it from scratch.  With all of the day to day operational demands who has the time to build archive systems?  That’s fodder for another post…
  • Sliding Window Partitions.  This is similar to archiving with a twist.  It requires us to have a fairly sophisticated partitioning process in place where when new data comes rolling in we automatically roll the oldest data out of the table.  Not only to maintain a consistent table size but in theory, that data that just “slid out of the table” is now going to archive storage (ie., slow and cheaper mediums).  There would be more sliding windows deployed if there was a wizard driven process to create the partitions and all of the supporting functions.  Since that is not the case, there will probably continue to be a bunch of opportunities for the consultant specializing in partitioning to speed up ETL and report queries.
  • Junk/temp tables.  We all do it so fess up.  We’re on the production server and we create a “temp” table, except it’s not in tempDB.  It’s a permanent table.  Temp tables are usually easy to spot.  The look like tmp_data_im_loading_toady, dgt_new_customer_fulfillments, temp_20120215, etc.  You need to be a little careful deleting these tables because I have seen ETL process that actually rely on temp tables… so they’re really not temp are they.  Identifying junk tables has saved us GIGs of space.
  • Too many indexes.  I’ve seen tables where the sum of the non-clustered indexes was three times the size of the data itself.  If you come across this, chances are there’s something wrong.  Not only are you wasting space to maintain these indexes, but ETL loads will need to update these indexes slowing down loads.  The optimizer for basic query selections will have a ton of indexes to consider (and probably never use) making query plan generation that much more complex and costly.

Today’s query will list an instances tables for all user databases.  Included in the result set are the size of the heap or clustered index (which is essentially the size of the data, since the index component of a clustered index is on the order of 1-2 percent of the index size).  It also lists the number and aggregate size of non-clustered indexes.

Using this data you will be able to review your table structures, identify junk tables, and identify tables that probably have unnecessary non-clustered indexes.  What you do with this information…  All depends if you are swamped with operational fires or can actually take the time to clean house.

I’m including the code in this post but will also make it available in the downloads section in the sidebar of this post.

List of Tables by Storage
EXEC sp_MSforeachdb
'    USE ?;
        
    IF db_id() > 4    
    BEGIN         
        WITH nci AS (    SELECT    i.[object_id]
                                , COUNT(i.[object_id]) AS [NumIndexes]
                                , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1000000000.0 AS NUMERIC(12, 4)) AS GBytes
                        FROM        sys.indexes i
                                INNER JOIN sys.partitions p
                                    ON    p.[object_id] = i.[object_id]
                                    AND    p.index_id = i.index_id
                                INNER JOIN sys.allocation_units au
                                    ON    au.container_id = p.partition_id
                        WHERE    i.index_id >= 2
                        AND        i.[object_id] > 100
                        GROUP BY i.[object_id] )

        SELECT    db_name() AS DBName
                , i.object_id
                , OBJECT_NAME(i.object_id) AS [TableName]
                , i.name AS [ClusteredIndexName]
                , i.index_id
                , i.type_desc AS index_type
                , SUM(p.[rows]) AS row_cnt
                , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1000000000.0 AS NUMERIC(12, 4)) AS CI_GBytes
                , nci.[NumIndexes]
                , nci.GBytes AS [NCI_IndexGBytes]
                , CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1000000000.0 AS NUMERIC(12, 4)) + nci.GBytes AS [TotalTableSize]
        FROM        sys.indexes i
                INNER JOIN sys.partitions p
                    ON    p.[object_id] = i.[object_id]
                    AND    p.index_id = i.index_id
                INNER JOIN sys.allocation_units au
                    ON    au.container_id = p.partition_id
                INNER JOIN nci nci
                    ON    nci.[object_id] = i.[object_id]
        WHERE    i.index_id <= 1
        AND        i.[object_id] > 100
        GROUP BY i.[object_id]
                ,    i.name
                , i.index_id
                , i.type_desc
                , nci.NumIndexes
                , nci.GBytes
        ORDER BY CAST((SUM(au.total_pages) * (8.0 * 1024.0)) / 1000000000.0 AS NUMERIC(12, 4)) + nci.GBytes DESC
    END';

Stay curious; keep learning…

0 Comments

Size of Filegroups and Files

In recent posts I looked at queries that show us data at the instance, the database, backups and now we will look at filegroups and the individual physical files.

Too many times disk space is not an issue, it is THE issue.  This query will show several key attributes of our database at the file level including:

  • List of filegroups for a database
  • List of files that are members of the filegroup
  • The amount of space ALLOCATED for the data file
  • The amount of space USED/FREE for the data file
  • The amount of remaining FREE SPACE for the drive that the file is on

Too many times we have a drive that is running out of storage and need to move tables/indexes to other files.  This query is a great starting point for identifying which files are in trouble.  It also gives us a birds eye view of other drives that may be candidates to move crowded tables to another drive.

Filegroups and Files
–drop table #sfs
–drop table #fixed_drives
–drop table #output_table
–drop table #databases
–drop table #dbf
–drop table #fg

—————————————
— Save result set from showfilestats
—————————————
CREATE TABLE #sfs(    fileid            tinyint
                    , filegroupid    tinyint
                    , totalextents    int
                    , usedextents    int
                    , dbfilename    sysname
                    , physfile        varchar(255) );
    
——————————————–
— Save result set from sys.database_files
——————————————–
CREATE TABLE #dbf(    [file_id]                int
                    , file_guid                uniqueidentifier
                    , [type]                tinyint
                    , type_desc                nvarchar(60)
                    , data_space_id            int
                    , [name]                sysname
                    , physical_name            nvarchar(260)
                    , [state]                tinyint
                    , state_desc            nvarchar(60)
                    , size                    int
                    , max_size                int
                    , growth                int
                    , is_media_ro            bit
                    , is_ro                    bit
                    , is_sparse                bit
                    , is_percent_growth        bit
                    , is_name_reserved        bit
                    , create_lsn            numeric(25, 0)
                    , drop_lsn                numeric(25, 0)
                    , read_only_lsn            numeric(25, 0)
                    , read_write_lsn        numeric(25, 0)
                    , diff_base_lsn            numeric(25, 0)
                    , diff_base_guid        uniqueidentifier
                    , diff_base_time        datetime
                    , redo_start_lsn        numeric(25, 0)
                    , redo_start_fork_guid    uniqueidentifier
                    , redo_target_lsn        numeric(25, 0)
                    , redo_target_fork_guid    uniqueidentifier
                    , back_lsn                numeric(25, 0) );
                    
——————————————–
— Save result set from sys.filegroups select * from sys.filegroups
——————————————–
CREATE TABLE #fg(    [name]                sysname
                    , data_space_id        int
                    , [type]            char(2)
                    , type_desc            nvarchar(60)
                    , is_default        bit
                    , [filegroup_id]    uniqueidentifier
                    , log_filegroup_id    int
                    , is_read_only        bit );
                    
— Populate #disk_free_space with data
CREATE TABLE #fixed_drives(DriveLetter    char(1) NOT NULL
                            , FreeMB    int NOT NULL );

INSERT INTO #fixed_drives
    EXEC master..xp_fixeddrives;
    
CREATE TABLE #output_table(DatabaseName    sysname
                            , FG_Name        sysname
                            , GB_Allocated    numeric(8, 2)
                            , GB_Used        numeric(8, 2)
                            , GB_Available    numeric(8, 2)
                            , DBFilename    sysname
                            , PhysicalFile    sysname
                            , Free_GB_on_Drive    numeric(8, 2) );
                            
SELECT    name AS DBName
INTO    #databases
FROM    sys.databases
WHERE    database_id > 4
AND        state_desc = ‘ONLINE’;

DECLARE    @dbname    sysname;

SELECT    @dbname =(    SELECT    TOP (1) DBName FROM    #databases);
DELETE FROM #databases WHERE DBName = @dbname;

WHILE @dbname IS NOT NULL
BEGIN

    — Get the file group data
    INSERT INTO #sfs            
        EXEC(‘USE ‘+ @dbname + ‘; DBCC showfilestats;’);
        
    INSERT INTO #dbf
        EXEC(‘USE ‘+ @dbname + ‘; SELECT * FROM sys.database_files;’);
        
    INSERT INTO #fg
        EXEC(‘USE ‘+ @dbname + ‘; SELECT * FROM sys.filegroups;’);
        
    — Wrap it up!
    INSERT INTO #output_table    (DatabaseName
                                , FG_Name
                                , GB_Allocated
                                , GB_Used
                                , GB_Available
                                , DBFilename
                                , PhysicalFile
                                , Free_GB_on_Drive )
        SELECT    @dbname AS DATABASE_NAME
                , fg.name AS [File Group Name]
                , CAST(((sfs.totalextents * 64.0) / 1024000.0) AS numeric(8, 2)) AS GB_Allocated
                , CAST(((sfs.usedextents * 64.0) / 1024000.0) AS NUMERIC(8, 2)) AS GB_Used
                , CAST((((sfs.totalextents sfs.usedextents) * 64.0) / 1024000.0) AS NUMERIC(8, 2)) AS GB_Available
                , sfs.dbfilename
                , sfs.physfile
                , CAST((fd.FreeMB / 1000.0) AS NUMERIC(8, 2)) AS Free_GB_on_Drive
        FROM        #sfs sfs
                INNER JOIN #dbf dbf
                    ON    dbf.[file_id] = sfs.fileid
                INNER JOIN #fg fg
                    ON fg.data_space_id = sfs.filegroupid
                INNER JOIN #fixed_drives fd
                    ON    fd.DriveLetter = SUBSTRING(sfs.physfile, 1, 1);
    
    SELECT    @dbname =(    SELECT    TOP (1) DBName FROM    #databases);
    IF @dbname IS NOT NULL
        DELETE FROM #databases WHERE DBName = @dbname;
        
    TRUNCATE TABLE #sfs;
    TRUNCATE TABLE #dbf;
    TRUNCATE TABLE #fg;
END

SELECT    CONVERT(int, CONVERT(char, current_timestamp, 112)) AS CaptureDate
        , DatabaseName
        , FG_Name
        , GB_Allocated
        , GB_Used
        , GB_Available
        , DBFilename
        , PhysicalFile
        , Free_GB_on_Drive
FROM    #output_table
ORDER BY DatabaseName
        , FG_Name

 

I’ll add the code for this to the downloads on the right side of this post…

Hopefully you’ll find this query useful for getting a detailed picture of how your databases are physically laid out.  Without this info it’s impossible to figure out how to reconfigure / redistribute your data.

Stay curious; keep learning…

Database Backup Snapshot

 

Every Monday morning I always make it a point to run some basic checks on our production database servers.  Yes, there are alerts… but sometimes those alerts get over looked, don’t get delivered, etc.  I like to know implicitly that we are in decent shape…

In previous posts I showed how to look at the server at the instance level:

http://www.daveturpin.com/2011/04/dba-query-instance-information/

And at the database level:

http://www.daveturpin.com/2011/07/list-of-databases/

Today I will show how I take a look at the database backups… probably THE MOST IMPORTANT service a DBA provides to the organization.  Without good backups a loss of data is a real possibility… which would probably be followed up with a loss of a job…

Reviewing database backups can by tricky because we have 3 possible backup varieties to consider: FULL, DIFFERENTIAL and TRANSACTION LOGS.  FULL backups apply to all databases.  DIFFERENTIAL definitely do not apply to the system databases and may apply to user databases.  TRANSACTION log backups do not apply to databases in SIMPLE recovery mode.

To tame this beast, I wrote a script the includes the recovery model, the number of days since the last FULL, DIFF and TRN backups.  If the database is SIMPLE, the TRN backup info will be minus one (-1).

So here’s the query:

Last Backups
— Get the last FULL backup
— drop table #full
SELECT    d.database_id
        , d.recovery_model AS recovery_model_id
        , bus.backup_set_id
        , d.Name as DatabaseName
        , 'FULL' AS [Backup Type]
        , Convert(varchar(12), bus.backup_finish_date, 101)    AS LastBackUpTaken
        , CAST(NULL AS numeric(8, 2)) AS [Backup Time(min)]
        , Convert(varchar(12), bus.[user_name], 101)    AS UserName
INTO #FULL
FROM        sys.databases d
        INNER JOIN msdb.dbo.backupset bus
            ON    bus.database_name = d.name
            AND    bus.[type] = 'D'
WHERE    bus.backup_set_id IN (    SELECT    TOP 1
                                        bus2.backup_set_id
                                FROM        msdb.dbo.backupset bus2
                                WHERE    bus2.database_name = d.name
                                AND        bus2.[type] = 'D'
                                ORDER BY bus2.backup_set_id DESC)
UNION

— Identify any dbs that never had a backup taken.
SELECT    d.database_id
        , d.recovery_model AS recovery_model_id
        , bus.backup_set_id
        , d.Name as DatabaseName
        , 'FULL' AS [Backup Type]
        , 'No FULL Backups'
        , CAST(NULL AS numeric(8, 2)) AS [Backup Time(min)]
        , Convert(varchar(12), bus.[user_name], 101)    AS UserName
FROM        sys.databases d
        LEFT OUTER JOIN msdb.dbo.backupset bus
            ON    bus.database_name = d.name
            AND    bus.[type] = 'D'
WHERE    bus.database_name IS NULL;

—- Differential Backups —-
— Get the last FULL backup
— drop table #DIFF
SELECT    d.database_id
        , d.recovery_model AS recovery_model_id
        , bus.backup_set_id
        , d.Name as DatabaseName
        , 'DIFF' AS [Backup Type]
        , Convert(varchar(12), bus.backup_finish_date, 101)    AS LastBackUpTaken
        , CAST(NULL AS numeric(8, 2)) AS [Backup Time(min)]
        , Convert(varchar(12), bus.[user_name], 101)    AS UserName
INTO #DIFF
FROM        sys.databases d
        INNER JOIN msdb.dbo.backupset bus
            ON bus.database_name = d.name
WHERE    bus.[type] = 'I'
AND        bus.backup_set_id IN (    SELECT    TOP 1
                                        bus2.backup_set_id
                                FROM        msdb.dbo.backupset bus2
                                WHERE    bus2.database_name = d.name
                                AND        bus2.[type] = 'I'
                                ORDER BY bus2.backup_set_id DESC)
                                
UNION

— Identify any dbs that never had a backup taken.
SELECT    d.database_id
        , d.recovery_model AS recovery_model_id
        , bus.backup_set_id
        , d.Name as DatabaseName
        , 'DIFF' AS [Backup Type]
        , 'No DIFF Backups'
        , CAST(NULL AS numeric(8, 2)) AS [Backup Time(min)]
        , Convert(varchar(12), bus.[user_name], 101)    AS UserName
FROM        sys.databases d
        LEFT OUTER JOIN msdb.dbo.backupset bus
            ON    bus.database_name = d.name
            AND    bus.[type] = 'I'
WHERE    bus.database_name IS NULL;

—- Log Backups —-
— drop table #LOG
SELECT    d.database_id
        , d.recovery_model AS recovery_model_id
        , bus.backup_set_id
        , d.Name as DatabaseName
        , 'LOG' AS [Backup Type]
        , Convert(varchar(12), bus.backup_finish_date, 101)    AS LastBackUpTaken
        , CAST(NULL AS numeric(8, 2)) AS [Backup Time(min)]
        , Convert(varchar(12), bus.[user_name], 101)    AS UserName
INTO #LOG
FROM        sys.databases d
        INNER JOIN msdb.dbo.backupset bus
            ON bus.database_name = d.name
WHERE    bus.[type] = 'L'
AND        d.recovery_model = 1
AND        bus.backup_set_id IN (    SELECT    TOP 1
                                        bus2.backup_set_id
                                FROM        msdb.dbo.backupset bus2
                                WHERE    bus2.database_name = d.name
                                AND        bus2.[type] = 'L'
                                ORDER BY bus2.backup_set_id DESC)
                                
UNION

— Identify any dbs that never had a backup taken.
SELECT    d.database_id
        , d.recovery_model AS recovery_model_id
        , bus.backup_set_id
        , d.Name as DatabaseName
        , 'LOG' AS [Backup Type]
        , 'No LOG Backups'
        , CAST(NULL AS numeric(8, 2)) AS [Backup Time(min)]
        , Convert(varchar(12), bus.[user_name], 101)    AS UserName
FROM        sys.databases d
        LEFT OUTER JOIN msdb.dbo.backupset bus
            ON    bus.database_name = d.name
            AND    bus.[type] = 'L'
WHERE    bus.database_name IS NULL
AND        d.recovery_model = 1;

— Get the backup run time:    
UPDATE    t
SET        [Backup Time(min)] = (DATEDIFF(ms, bus.backup_start_date, bus.backup_finish_date) * 1.0) / (60 * 1000)
FROM        #FULL t
        INNER JOIN msdb.dbo.backupset bus
            ON bus.backup_set_id = t.backup_set_id;
            
UPDATE    t
SET        [Backup Time(min)] = (DATEDIFF(ms, bus.backup_start_date, bus.backup_finish_date) * 1.0) / (60 * 1000)
FROM        #DIFF t
        INNER JOIN msdb.dbo.backupset bus
            ON bus.backup_set_id = t.backup_set_id;
            
UPDATE    t
SET        [Backup Time(min)] = (DATEDIFF(ms, bus.backup_start_date, bus.backup_finish_date) * 1.0) / (60 * 1000)
FROM        #LOG t
        INNER JOIN msdb.dbo.backupset bus
            ON bus.backup_set_id = t.backup_set_id;

— Pull it all together:
SELECT    CONVERT(int, CONVERT(char, current_timestamp, 112)) AS CaptureDate
        , f.database_id
        , f.DatabaseName
        , CASE f.recovery_model_id
            WHEN 1 THEN 'FULL'
            WHEN 2 THEN 'BULK'
            WHEN 3 THEN 'SIMPLE'
            ELSE 'Unknown'
          END AS [Recovery_Model]
         , f.backup_set_id AS [FULL_Backup_Set_ID]
         , f.LastBackUpTaken AS [Last_FULL_Backup_Date]
         , DATEDIFF(DAY,
                    CAST(    CASE f.LastBackUpTaken
                                WHEN 'No FULL Backups' THEN DATEADD(DAY, 1, CURRENT_TIMESTAMP)
                                ELSE f.LastBackUpTaken
                            END AS SMALLDATETIME),
                    CURRENT_TIMESTAMP) AS [#Days Since Last FULL]
         , f.[Backup Time(min)] AS [FULL_Backup_Time_min]
         , f.UserName AS [FULL_Backup_Operator]
         
         , d.backup_set_id AS [DIFF_Backup_Set_ID]
         , d.LastBackUpTaken AS [Last_DIFF_Backup_Date]
         ,  DATEDIFF(DAY,
                    CAST(    CASE d.LastBackUpTaken
                                WHEN 'No DIFF Backups' THEN DATEADD(DAY, 1, CURRENT_TIMESTAMP)
                                ELSE d.LastBackUpTaken
                            END AS SMALLDATETIME),
                    CURRENT_TIMESTAMP) AS [#Days Since Last DIFF]
         , d.[Backup Time(min)] AS [DIFF_Backup_Time_min]
         , d.UserName AS [DIFF_Backup_Operator]
         
         , l.backup_set_id AS [LOG_Backup_Set_ID]
         , l.LastBackUpTaken AS [Last_LOG_Backup_Date]
         ,  DATEDIFF(DAY,
                    CAST(    CASE l.LastBackUpTaken
                                WHEN 'No LOG Backups' THEN DATEADD(DAY, 1, CURRENT_TIMESTAMP)
                                ELSE l.LastBackUpTaken
                            END AS SMALLDATETIME),
                    CURRENT_TIMESTAMP) AS [#Days Since Last LOG]
         , l.[Backup Time(min)] AS [LOG_Backup_Time_min]
         , l.UserName AS [LOG_Backup_Operator]
FROM        #FULL f
        LEFT OUTER JOIN #DIFF d ON d.database_id = f.database_id
        LEFT OUTER JOIN #LOG l ON l.database_id = f.database_id
— Exclude tempdb
WHERE    f.database_id <> 2
ORDER BY f.database_id;

 

OK, it’s a little long… but worth the effort.  If you manage more than one database server, which most DBAs do, you can plug this query into a Central Management Server query window and get a snapshot of all of your database backups in one easy view.

I like to put the result set into an excel file and filter on the Recovery model to make sure my FULL recovery databases are running transaction log backups.  How many times have you answered the call to a developer or end user complaining about a full transaction log.  I’m happy to say that because of this simple review process, I never get the call for PRODUCTION trans log backup issues due to backups not running.  (I’m not as diligent with the DEV and TEST servers so yes, sometimes stuff does happen there).

Note: I did add the TSQL scripts for the INSTANCE, DATABASE and BACKUPs to the download section of this blog.  Apparently this code object is not very good at copying out code.

If you’ve been slack with reviewing your backups you might just find some nuggets of interest in the result set from this script…

Stay curious… keep learning…

0 Comments

SQL Server Backup Compression Default

In SQL Server 2008 Microsoft added the long overdue backup compression feature.  You may have noticed in the backup dialog box the pull down menu to use the DEFAULT SERVER SETTING for compression.

image

To determine what the DEFAULT value actually is:

image

Out of the box, SQL Server 2008 backup compression is turned OFF.  To avoid the need to select COMPRESSION every time a backup is taken (assuming that is your intent), you can enable backup compression at the server level:

Enable Backup Compresion:
  1. sp_configure ‘backup compression default’, 1
  2. reconfigure
  3. go

This will simply allow you to not have to worry about explicitly selecting the backup COMPRESSION option in the backup dialog box.

If all of your servers are in SQL Server’s new CENTRAL MANAGEMENT SERVER, you can enable backup compression on all of the servers by executing the above command ONE time.

Stay curious… keep learning…

0 Comments

Monitor Transaction Log Rollback

I recently had a situation where an index was being updated but the process hung in the middle of the job.  Apparently this isn’t all that far fetched since it appears it has happened to other folks out there (the internet is the best… I rarely have a NEW problem)…

After stopping the query it went into a rollback.  A rollback that wouldn’t end.

How long should the query be rolling back?  How can I tell if the rollback is actually doing something, or is this thing never going to finish?

It turns out you can monitor the bytes reserved in the transaction log during a rollback.  As long as the reserved bytes are getting smaller your transaction rollback is working.

In my case, the rollback was NOT working and the reserved bytes remained flat.  The workaround, unfortunately, was to restart the SQL Server service.

Here’s the simple query to test the reserved bytes:

    select      current_timestamp as tod,
                database_transaction_log_bytes_reserved,
                database_transaction_log_bytes_used
    from  sys.dm_tran_database_transactions
    where transaction_id > 1000
    and database_id = 7 — AdventureWorks
    — and transaction_id = whatever it is…

Just to prove the point, I’ll demonstrate how it works by running a transaction in AdventureWorks.

The first thing I’ll do is setup a query to collect the bytes reserved every 1 second.  The results will be saved to a temp table with the very unique name, #t.

Save Bytes Reservered
create table #t (    tod    datetime
                    , bytes_reserved    int
                    , bytes_used        int )
    
while 1=1
begin    
    insert into #t            
    select      current_timestamp as tod,
                database_transaction_log_bytes_reserved,
                database_transaction_log_bytes_used
    from  sys.dm_tran_database_transactions
    where transaction_id > 1000
    and database_id = 7 — AdventureWorks
    — and transaction_id = whatever it is…
    
    waitfor delay '00:00:01'
end

 

Now I’ll do a little update in AdventureWorks, then rollback the transaction.

Kick off Xaction and Rollback
begin tran

    — 16s
    update Sales.SalesOrderDetail
    set ModifiedDate = CURRENT_TIMESTAMP
    
rollback    — 2 s

 

The update took about 16s to run; followed by a few second pause before I ran the rollback, which took 2 seconds.

Here’s what the result set looks like:

image

 

As the update transaction is running we see the transaction log bytes reserved increasing.  Then there is a delay before I did the rollback… Finally the last two entries shows the reserved bytes decreasing.

Although this tool may not tell you how LONG the ROLLBACK will take place, it will at least tell you that it is in fact ROLLING back and is not “hung”.

You can narrow down the query a bit by adding the transaction ID to ensure you are monitoring the transaction you need to look at.

Stay curious… Keep learning…

List of Databases

Back in April I posted a query that shows instance information.  This post is the next level down in the hierarchy that I collect on my database platforms each week…  The databases in an instance.

This query is certainly one of the more straight forward queries to capture database data.  It probably doesn’t really need to be executed / collected weekly, but I do it anyway to be consistent with my collection process.

If you plug this query into a SQL Server Central Management server you’ll have a snapshot of all your instances that are part of the CMS.

This query will return the database name, the overall size of the database including all data and log files, the create date, owner, compatibility level, online vs offline, and update stats info.

It’s interesting to find databases owned by staff that are no longer around, compatibility levels to old versions for no obvious reason, and create and update stats that are turned off.

Like they say on the History Channel’s Pawn Star show… you never know WHAT will be coming through that door.

So here’s the query:

SELECT    CONVERT(int, CONVERT(char, current_timestamp, 112)) AS CaptureDate
        , d.database_id
        , d.[name]
        , CAST(SUM(mf.size * 8096.0 / 1000000000) AS NUMERIC(18, 2)) AS [db_size (G)]
        , d.recovery_model_desc AS [Recovery Model]
        , d.create_date
        , suser_sname(d.owner_sid) AS [Owner]
        , d.[compatibility_level]
        , d.state_desc AS [State]
        , d.is_auto_create_stats_on
        , d.is_auto_update_stats_on
FROM        sys.databases d
        INNER JOIN sys.master_files mf
            ON    mf.database_id = d.database_id
WHERE    d.database_id > 4    — Exclude the system databases.
GROUP BY d.database_id
        , d.[name]
        , d.recovery_model_desc
        , d.create_date
        , d.owner_sid
        , d.[compatibility_level]
        , d.state_desc
        , d.is_auto_create_stats_on
        , d.is_auto_update_stats_on

My next post in this series will be on database backups, which is a much more interesting query AND really does need to be monitored weekly, minimum.

Stay curious… keep learning…

0 Comments