RSS

Category Archives: Performance MSSQL

Improve SQL Server Performance with Compression

Improve SQL Server Performance with Compression

SQL Server data compression is now available in all editions of SQL Server, starting with 2016 SP1.

In this Article, you will not only know how data compression will save space, you’ll also find out how compression can sometimes improve performance as well.

Space Savings vs. Performance

When I first heard about compression back in 2008, my first thought is that it would have a big performance penalty. Compression would save disk space, but it would probably decrease performance as the data was compressed and decompressed. It turns out that compression can improve performance instead. Because compressed data fits in a smaller number of data pages, there are decreased I/O requirements. Since I/O is generally the bottleneck in SQL Server, this can improve performance. Compressed data also has a decreased memory requirement. When querying compressed data, a smaller number of pages will be copied to the buffer pool. The one area that is impacted is CPU. You do need to have some CPU headroom because compression will require some additional CPU resources.

a1

The good thing is that if the workload is reasonably tuned, many SQL Server instances have more CPU resources than they need. One note of caution here. Don’t look to compression as the solution to solving major performance issues. You need to look at physical resources, configuration, indexing, and query tuning. The point I’m trying to make is that decreased I/O and better memory utilization will benefit the workload in many cases.

Availability

Compression has been available in SQL Server for all versions from 2008 to 2016 SQL server, but only for Enterprise Edition. Beginning with SQL Server 2016 SP1, it’s now available in Enterprise, Standard, and Express. I’m excited about this because now compression is within the reach of any organization.

Compression Types

There are two types of compression that you can use to compress a table, index, or even a partition:

  • Row Level:

  • Row level compression works by storing fixed-width data types as variable length data types. Nulls and zeroes do not take any space. For example, in an uncompressed table, an integer column takes 4 bytes per row, even for those rows that have a small number such as 1 or 10, or even null. Once row compression is implemented on the table, each value will be stored with the smallest possible number of bytes. So outside of the metadata, storing a 1 in an integer column will take 1 byte, storing a null or 0 will take 0 bytes.

 a2

  • Starting with SQL Server 2012, Unicode compression is implemented when you use row compression. This applies to Unicode columns in varchar and in char. In an uncompressed table, each character takes up 2 bytes in a Unicode column, even if a small character set is used. In those cases, compressing the table will store the Unicode characters in 1 byte instead.

a3

  • In the technical article, Data Compression: Strategy, Capacity Planning, and Best Practices, Microsoft recommends using row compression on all data, as long as you have 10% extra CPU capacity, and, of course, as long as you achieve some space savings.
  • It costs Low CPU penalty.
  • Page Level:

  • When you implement page level compression, the rows are automatically row-compressed first.
  • Page level compression also removes repeated data within a page by two mechanisms, prefix and dictionary compression.
  • Page level compression can compress a table to a smaller size than row compression, but it is recommended for tables that are mostly inserted, but not updated that often.
  • It does have a higher CPU penalty.

Neither of these compression types will work on row-overflow data. This is data from a row that exceeds 8K. For example, you can create a table that has two varchar 8000 columns. If a row exceeds 8060 bytes, then SQL Server will move one or more of those columns to another page so that the row fits. One way to get around this issue is the new COMPRESS function.

COMPRESS Function

  • You can use this function to compress individual values, which includes row-overflow data.
  • It uses a GZIP algorithm to compress the values. There is a downside to this, however. The COMPRESS function must be applied each time a value is inserted or updated, and a DECOMPRESS function must then be used to read the value. This means that there would be changes to the application or stored procedure.

Performance Metrics

In this demo, we’ll do some performance comparisons. We’ll take a close look at I/O and memory impact. We’ll see how compression affects both reads and writes to the data.

  • Select comparison

I created two tables; one with page compression called “bigTransactionHistoryPAGE” and one with row compression called “bigTransactionHistoryROW”. I also created a table with no compression called “bigTransactionHistory” that I’ll use during this demonstration as well. And finally, I populated the two compressed tables with around 31 million records.

Let’s take a look at the number of pages in each table or index:

SELECT OBJECT_NAME(i.[object_id]) AS TableName,

       i.name AS IndexName, SUM(s.used_page_count) IndexPages,

       FORMAT(1 - SUM(s.used_page_count) * 1.0/CASE WHEN i.name LIKE 'IX%' THEN 131819 ELSE 143645 END, 'P') AS PercentSaved

FROM sys.dm_db_partition_stats  AS s

JOIN sys.indexes AS i

ON s.[object_id] = i.[object_id] AND s.index_id = i.index_id

WHERE OBJECT_NAME(i.[object_id]) LIKE 'bigTransactionHistory%'

       AND OBJECT_NAME(i.[object_id]) <> 'bigTransactionHistoryTEST'

GROUP BY i.[object_id],i.name

ORDER BY IndexPages Desc;

a4

The uncompressed table from is the largest. The non-clustered index compressed with row compression saved about 30% of the space, and the clustered index saved about 37%. The page compressed table is really interesting. The clustered index is much smaller than the original table, while the non-clustered index is about 39% smaller.

I’m going to turn on STATISTICS IO to compare the number of pages touched when I query each table:

SET STATISTICS IO ON;

GO

SELECT SUM(Quantity) AS ItemsPurchased

FROM bigTransactionHistory;




SELECT SUM(Quantity) AS ItemsPurchased

FROM bigTransactionHistoryROW;




SELECT SUM(Quantity) AS ItemsPurchased

FROM bigTransactionHistoryPAGE;

a5

You see here that less I/O is needed for the compressed table. That’s actually obvious since the clustered and non-clustered indexes are smaller when compressed.

The next query shows how many pages are in the buffer for each index:

SELECT COUNT(*)AS cached_pages_count

,name ,index_id

FROM sys.dm_os_buffer_descriptors AS bd

INNER JOIN

(

SELECT object_name(object_id) AS name

,index_id ,allocation_unit_id

FROM sys.allocation_units AS au

INNER JOIN sys.partitions AS p

ON au.container_id = p.hobt_id

AND (au.type = 1 OR au.type = 3)

UNION ALL

SELECT object_name(object_id) AS name

,index_id, allocation_unit_id

FROM sys.allocation_units AS au

INNER JOIN sys.partitions AS p

ON au.container_id = p.partition_id

AND au.type = 2

) AS obj

ON bd.allocation_unit_id = obj.allocation_unit_id

WHERE database_id = DB_ID()  AND name LIKE 'bigTransactionHistory%'

GROUP BY name, index_id

ORDER BY cached_pages_count DESC;

a6

Again, when the index resides on fewer pages, the data takes less space and memory. Notice that the clustered index was used for the page-compressed table. We saw that the page-compressed clustered index is much smaller than the non-clustered index.

On this Azure VM, I have seen inconsistent results in the time to run the queries. For example, if I use DBCC DROPCLEANBUFFERS, or even restart SQL Server to ensure the data must be loaded from disk, one of these queries could run anywhere from 1 second to even a minute. I suspect that the VM is sharing resources with other VMs, and that is causing my discrepancies. I decided to run the queries each in a loop to better see the difference. Inside the loop, I use DBCC DROPCLEANBUFFERS to make sure that the data was loaded from disk, and not cached data.

DECLARE @Count INT = 0;

WHILE @Count < 100 BEGIN

       DBCC DROPCLEANBUFFERS;    

       SELECT SUM(Quantity) AS ItemsPurchased

       FROM bigTransactionHistory;

       SET @Count += 1;

END;

The loop against the uncompressed table took 3 minutes and 31 seconds. The row-compressed table took 3 minutes and 4 seconds and the page compressed table took 3 minutes and 18 seconds. So in this case, the row-compressed query performed the best, with the page-compressed query next.

  • Insert comparison

Let’s take a look at inserting data. In this script, I populated an uncompressed table and our two compressed tables with a loop of 1000 inserts of 1000 rows each. To make things fair, I ran DROPCLEANBUFFERS before each loop.

DBCC DROPCLEANBUFFERS;

GO

--2:40

DECLARE @Count INT = 0;

WHILE @Count < 1000 BEGIN

INSERT INTO dbo.bigTransactionHistoryTEST

SELECT *

FROM bigTransactionHistory

WHERE TransactionID BETWEEN @Count * 1000 + 1 AND (@Count + 1) * 1000;

SET @Count = @Count + 1;

END;

The uncompressed inserts took 2 minutes and 40 seconds. The row-compressed inserts completed a bit faster at 2 minutes and 23 seconds, and the page-compressed inserts completed in 2 minutes and 14 seconds. The page-compressed inserts were actually fastest, and I suspect that it might be due to less I/O.

  • Update comparison

Let’s take a look at updates.

DBCC DROPCLEANBUFFERS;

GO

–1:48

DECLARE @Count INT = 0;

WHILE @Count < 1000 BEGIN

UPDATE dbo.bigTransactionHistoryTEST

SET Quantity = Quantity + 1

WHERE TransactionID BETWEEN @Count * 1000 + 1 AND (@Count + 1) * 1000;

SET @Count = @Count + 1;

END;

Again, I ran loops. In this case, it’s 1000 updates of 1000 rows Updating the uncompressed table took 1 minute and 48 seconds. Updating the row-compressed table took 1 minute and 36 seconds. The page-compressed table was much slower at 2 minutes and 21 seconds. During the slides, I mentioned that Microsoft recommends page compression for workloads that have few updates, and you can see that the updates are slower for page compression.

To summarize, I saw decent space savings, especially with page compression. The compressed tables performed better for both selects and inserts. Updates, however, performed noticeably worse with page compression.

a7

Summary:

This Article is an overview of data compression. Compression is now available in all editions of SQL Server, starting with 2016 SP1. You can implement row and page level compression, and also use the new COMPRESS function. Compression can improve performance because of decreased I/O and memory pressure. I hope this article has been informative for you.

Useful Links:

 

Tags: , , , ,

Dissecting SQL Server Execution Plans

I remember my days before, Microsoft SQL Server PFE.  I wanted to learn everything and know everything about SQL Server.  However, getting hold of good resources was tough, as I didn’t have any mentor when I started down my journey to becoming a SQL Server Database Administrator. 

Along the way I did pick up lots of books and references.  One of such books is Dissecting SQL Server Execution Plans. 

I read this book before becoming PFE, I read this now, and I recommend everyone read this book more then once. 

Grant Fritchey (Blog|Twitter) wrote the book back in 2008; I would still recommend take ready.  This book will only help you be better DBA and Developer.

SQL Central, Jeff Moden, Dissecting SQL Server Execution Plans

http://www.sqlservercentral.com/articles/Book+Reviews/69019/

Amazon, SQL Server Execution Plans

http://www.amazon.com/gp/product/1906434026?ie=UTF8&tag=dkranchnet&linkCode=as2&camp=1789&creative=390957&creativeASIN=1906434026

SQL Central, Red Gate, EBook

http://www.sqlservercentral.com/articles/books/65831/?utm_source=ssc&utm_medium=weblink&utm_content=Grant&utm_campaign=sqltoolbelt

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

 
Leave a comment

Posted by on June 1, 2016 in Performance MSSQL

 

Tags: ,

Index Creation Date

Often when troubleshooting performance related issues, it is beneficial to know when new indexes were introduced.  However currently there is no easy way to keep track of this information or attain this information.  That’s why one of my colleagues and a good friend, Mohamed Sharaf (Blog | Twitter), has submitted a request on Microsoft Connect, Expose index creation date in sys.indexes, Link.  Please take a few minutes to vote if you agree Smile.

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

 
Leave a comment

Posted by on October 19, 2015 in Performance MSSQL

 

Tags: ,

Difference between auto update statistics and auto update statistics asynchronously

Dear readers,

Please find the interview question and its answer based on statistics in SQL Server.

How to find auto update statistics and auto update statistics asynchronously options in SSMS:

Right click on your database–> Go to Properties…> options

Stats

The query optimizer uses statistics to create query plans that improve query performance.

AUTO_UPDATE_STATISTICS Option

As from snapshot above, by default its value is true.

When the automatic update statistics option, AUTO_UPDATE_STATISTICS, is on, the query optimizer determines when statistics might be out-of-date and then updates them when they are used by a query. Statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view. The query optimizer determines when statistics might be out-of-date by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold. The threshold is based on the number of rows in the table or indexed view.

The query optimizer checks for out-of-date statistics before compiling a query and before executing a cached query plan. Before compiling a query, the query optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date. Before executing a cached query plan, the Database Engine verifies that the query plan references up-to-date statistics.

The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. This option also applies to filtered statistics.

Auto Update Statistics Asynchronously 

The default setting for this option is disabled.

The asynchronous statistics update option, AUTO_UPDATE_STATISTICS_ASYNC, determines whether the query optimizer uses synchronous or asynchronous statistics updates. By default, the asynchronous statistics update option is off, and the query optimizer updates statistics synchronously. The AUTO_UPDATE_STATISTICS_ASYNC option applies to statistics objects created for indexes, single columns in query predicates, and statistics created with the CREATE STATISTICS statement.

Statistics updates can be either synchronous (the default) or asynchronous. With synchronous statistics updates, queries always compile and execute with up-to-date statistics; when statistics are out-of-date, the query optimizer waits for updated statistics before compiling and executing the query. With asynchronous statistics updates, queries compile with existing statistics even if the existing statistics are out-of-date; the query optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles. Queries that compile after the asynchronous updates have completed will benefit from using the updated statistics.

Consider using synchronous statistics when you perform operations that change the distribution of data, such as truncating a table or performing a bulk update of a large percentage of the rows. If you do not update the statistics after completing the operation, using synchronous statistics will ensure statistics are up-to-date before executing queries on the changed data.

Consider using asynchronous statistics to achieve more predictable query response times for the following scenarios:

  • Your application frequently executes the same query, similar queries, or similar cached query plans. Your query response times might be more predictable with asynchronous statistics updates than with synchronous statistics updates because the query optimizer can execute incoming queries without waiting for up-to-date statistics. This avoids delaying some queries and not others.
  • Your application has experienced client request time outs caused by one or more queries waiting for updated statistics. In some cases, waiting for synchronous statistics could cause applications with aggressive time outs to fail.

Thanks for reading!!

Your suggestions and likes will be appreciated

 

Follow Us :

LinkedIn Slideshare ,Youtube Channel.MSDN POSTS , WHO WE ARE

 
Leave a comment

Posted by on September 18, 2015 in Performance MSSQL

 

Tags:

Check and Enable Automatic Statistic Update on Database Level

AS SQL Server Best Practices: Auto-Create and Auto-Update Statistics Should Be On – Most of the Time in general they are a very good thing for performance. You could try to figure out which columns need statistics, but it’s often better to let SQL Server do that for you.You can turn on the AUTO_CREATE_STATISTICS database option and SQL Server will automatically determine when it needs the Statistics and create them for you.

Check And Enable Automatic Statistic Update on all databases have this option is disable

Select ‘ALTER DATABASE [‘+Name+’] SET AUTO_CREATE_STATISTICS ON;
GO
ALTER DATABASE [‘+Name+’] SET AUTO_UPDATE_STATISTICS ON;
GO
‘ from Sys.Databases where is_auto_update_stats_on = 0 or is_auto_create_stats_on = 0

Take the Result and Execute it in another session

Update Statistics

To check all my posts you godirect to my personal website or you can download this Excel sheet My Technical POSTS

View all my tips , LinkedIn Slideshare ,Youtube Channel.My posts in MSDN

 

Tags: , , ,

Bookmark lookup , Key lookup , RID lookup in SQL Server

Introduction : 

Today i will Show  :

What  is the Meaning of Bookmark lookup , Key lookup , RID lookup in SQL Server ?

when SQL Server Retrieve Bookmark lookup , Key lookup , RID lookup ?

How can we Remove the Bookmark lookup , Key lookup , RID lookup ?

Bookmark lookup , Key lookup , RID lookup Meaning  and when SQL Server Retrieve it ?

Read the rest of this entry »

 
Leave a comment

Posted by on June 17, 2013 in Index, Performance MSSQL

 

Display Table ,Column ,KeyCols ,IncludeCols for All index

Hi Guys

by this Script you can return 

Table Name 

Index Name

Column Name Created on the index

Include Column 

=======================================================

SELECT ‘[‘ + Sch.NAME + ‘].[‘ + Tab.[name] + ‘]’ AS TableName
,Ind.[name] AS IndexName
,SUBSTRING((
SELECT ‘, ‘ + AC.NAME
FROM sys.[tables] AS T
INNER JOIN sys.[indexes] I ON T.[object_id] = I.[object_id]
INNER JOIN sys.[index_columns] IC ON I.[object_id] = IC.
[object_id]
AND I.[index_id] = IC.[index_id]
INNER JOIN sys.[all_columns] AC ON T.[object_id] = AC.[object_id]
AND IC.[column_id] = AC.[column_id]
WHERE Ind.[object_id] = I.[object_id]
AND Ind.index_id = I.index_id
AND IC.is_included_column = 0
ORDER BY IC.key_ordinal
FOR XML PATH(”)
), 2, 8000) AS KeyCols
,SUBSTRING((
SELECT ‘, ‘ + AC.NAME
FROM sys.[tables] AS T
INNER JOIN sys.[indexes] I ON T.[object_id] = I.[object_id]
INNER JOIN sys.[index_columns] IC ON I.[object_id] = IC.
[object_id]
AND I.[index_id] = IC.[index_id]
INNER JOIN sys.[all_columns] AC ON T.[object_id] = AC.[object_id]
AND IC.[column_id] = AC.[column_id]
WHERE Ind.[object_id] = I.[object_id]
AND Ind.index_id = I.index_id
AND IC.is_included_column = 1
ORDER BY IC.key_ordinal
FOR XML PATH(”)
), 2, 8000) AS IncludeCols
FROM sys.[indexes] Ind
INNER JOIN sys.[tables] AS Tab ON Tab.[object_id] = Ind.[object_id]
INNER JOIN sys.[schemas] AS Sch ON Sch.[schema_id] = Tab.[schema_id]
— WHERE Tab.name = ‘MyTableNameHere’
— uncomment to get single table indexes detail
ORDER BY TableName

 
11 Comments

Posted by on June 4, 2013 in Index, Script