In SQL Server, statistics can be created using CREATE STATISTICS command or using CREATE INDEX command. At the feature level, the statistical information created using CREATE STATISTICS command is equivalent to the statistics built by a CREATE INDEX command on the same columns. The only difference is that the CREATE STATISTICS command uses sampling by default while the CREATE INDEX command gathers the statistics with fullscan since it has to process all rows for the index anyway.
A typical command will look like:
CREATE STATISTICS [IX_Stats_City]
WITH SAMPLE 50 PERCENT;
In this command, we are sampling 50% of the rows. For bigger tables, a random sampling may not produce accurate statistics. Therefore, for bigger tables, you may need to use the resample option on UPDATE STATISTICS. The resample option will maintain the fullscan statistics for the indexes and sample statistics for the rest of the columns.
Statistical information is updated when approximately 20 percent of the data rows have changed. Though there are some exceptions to this rule, we will keep this guideline as generic. We can also manually update statistics using UPDATE STATISTICS.
To provide up-to-date statistics, the Query Optimizer needs to make smart query optimization decisions. It is generally best to leave the “AUTO UPDATE STATISTICS” database option ON (the default setting). This helps to ensure that the Optimizer statistics are valid, so that queries are properly optimized when they are run. Additionally, SQL Server uses AUTO_CREATE_STATISTICS, which causes the server to automatically generate all statistics required for the accurate optimization of a specific query.
From SQL Server 2005 version, SQL Server maintains modification counters on a per-column basis rather than a per-row basis as was done in earlier versions. Therefore, sysindexes.rowmodctr is an approximation of what earlier versions of SQL Server would have shown, but the column is not used to determine when auto statistics occurs.
In SQL Server 2000, sp_updatestats would iterate over all the objects in the database and update statistics for every object, regardless of whether there had been any changes to the table (that is, rowmodctr was zero). This has changed from SQL Server 2005, so that, if the sysindexes.rowmodctr value is zero, then the index/statistics is skipped because its statistics are already fully up to date. Running sp_updatestats on a database with objects requiring no update will send a message like:
0 index(es)/statistic(s) have been updated, 0 did not require update.
Though this command still works in the latest version of SQL Server, it is recommended to move to the new syntax of UPDATE STATISTICS.
This command was introduced with SQL Server 2005 version and is used similar to sp_updatestats interchangeably. Updating of statistics ensures that any query that runs get the up-to-date statistics to satisfy the query needs. A typical command would look like:
UPDATE STATISTICS Sales.SalesOrderDetail
WITH FULLSCAN, ALL
This command computes statistics by scanning all rows in the Sales.SalesOrderDetail table. FULLSCAN and SAMPLE 100 PERCENT have the same results. Use caution when using FULLSCAN on large tables as it can take time and also affect performance of the system. It is ideal to do the same during non-peak hours or during maintenance windows. FULLSCAN cannot be used with the SAMPLE option.
There are some conditions in which it might be appropriate to turn off auto statistics or disable it for a particular table. For example, when a SQL Server database is under very heavy load, sometimes the auto update statistics feature can update the statistics on large tables at inappropriate times, such as the busiest time of the day. In such cases, you may want to turn autostats off, and manually update the statistics (using UPDATE STATISTICS) when the database is under a comparatively lesser load.
UPDATE STATISTICS Sales.SalesOrderDetail
WITH FULLSCAN, NORECOMPUTE
The above command forces a full scan of all the rows in the Sales.SalesOrderDetail table, and turns off automatic statistics for the table. To re-enable the AUTO_UPDATE_STATISTICS option behavior, run UPDATE STATISTICS again without the NORECOMPUTE option. To know when the statistics were last updated, use the STATS_DATE function.
Similar to the STATS_DATE function, we can also use another command DBCC SHOW_STATISTICS for the same data for a specific table and index like:
DBCC SHOW_STATISTICS (‘[HumanResources].[Shift]‘, ‘PK_Shift_ShiftID’)
At the same time, you need to analyze what will happen if you turn off the auto update statistics feature. While turning this feature off may reduce some stress on your server, by not running at inappropriate times of the day, it could also cause some of your queries to be not properly optimized, which could put extra stress on your server during busy times. It is a fine line of trade off which only an experienced DBA can make based on application workload and query patterns.
As with many other optimization issues, you will need to test to see if turning this option on or off is more effective for your environment.
Statistics is an important concept inside SQL Server and keeping it up-to-date is essential. In this blog post, we have dealt with the basics of statistics and how to update the same. In future posts we will expand on the same.