LogoDropshadow
Application and Server Management
 

Understanding SQL Server Statistics

Donabel Santos (twitter (@sqlbelle) | blog) – April 25, 2011

“Statistics provides tools that you need in order to react intelligently to information you hear or read” – David Lane, 2003

 

If there’s an upcoming election and you are running for office and getting ready to go from town to town city to city with your flyers, you will want to know approximately how many flyers you’re going to bring.

If you’re the coach of a sports team, you will want to know your players’ stats before you decide who to play when, and against who. You will often play a matchup game, even if you have 20 players, you might be allowed to play just 5 at a time, and you will want to know which of your players will best match up to the other team’s roster. And you don’t want to interview them one by one at game time (table scan), you want to know, based on their statistics, who your best bets are.

Just like the election candidate or the sports coach, SQL Server tries to use statistics to “react intelligently” in its query optimization. Knowing number of records, density of pages, histogram, or available indexes help the SQL Server optimizer “guess” more accurately how it can best retrieve data. A common misnomer is that if you have indexes, SQL Server will use those indexes to retrieve records in your query. Not necessarily. If you create, let’s say, an index to a column City and <90% of the values are ‘Vancouver’, SQL Server will most likely opt for a table scan instead of using the index if it knows these stats.

For the most part, there *may* be minimal we need to do to keep our statistics up-to-date (depending on your configurations), but understanding statistics a little bit better is in order to help us understand SQL Server optimization a little bit more.

How are statistics created?

Statistics can be created different ways
- Statistics are automatically created for each index key you create.

- If the database setting autocreate stats is on, then SQL Server will automatically create statistics for non-indexed columns that are used in queries.

- CREATE STATISTICS

What do statistics look like?

If you’re curious, there’s a couple ways you can peek at what statistics look like.

Option 1 – you can go to your Statistics node in your SSMS, right click > Properties, then go to Details. Below is a sample of the stats and histogram that’s collected for one of the tables in my database

Option 2 – you can use DBCC SHOW_STATISTICS WITH HISTOGRAM

The histograms are a great way to visualize the data distribution in your table.

How are statistics updated?

The default settings in SQL Server are to autocreate and autoupdate statistics.

Notice that there are two (2) options with the Auto Update statistics.
- Auto Update Statistics basically means, if there is an incoming query but statistics are stale, SQL Server will update statistics first before it generates an execution plan.
- Auto Update Statistics Asynchronously on the other hand means, if there is an incoming query but statistics are stale, SQL Server uses the stale statistics to generate the execution plan, then updates the statistics afterwards.

However, if you want to manually update statistics, you can use either sp_updatestats or UPDATE STATISTICS <statistics name>

How do we know statistics are being used?

One good check you can do is when you generate execution plans for your queries:

check out your “Actual Number of Rows” and “Estimated Number of Rows”.

If these numbers are (consistently) fairly close, then most likely your statistics are up-to-date and used by the optimizer for the query. If not, time for you to re-check your statistics create/update frequency.

What configuration settings should we set?

There may be cases when you may want to disable statistics update temporarily while you’re doing massive updates on a table, and you don’t want it to be slowed down by the autoupdate.

However, for the most part, you will want to keep the SQL Server settings:
- auto create statistics
- auto update statistics

References:

Rob Carrol. http://blogs.technet.com/b/rob/archive/2008/05/16/sql-server-statistics.aspx

Elisabeth Redei has an excellent 3-part series on SQL Server Statistics:
http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/lies-damned-lies-and-statistics-part-i.aspx
http://sqlblog.com/blogs/elisabeth_redei/archive/2009/08/10/lies-damned-lies-and-statistics-part-ii.aspx
http://sqlblog.com/blogs/elisabeth_redei/archive/2009/12/17/lies-damned-lies-and-statistics-part-iii-sql-server-2008.aspx

Excellent Books that touch on statistics
- Apress. Grant Fritchey & Sajal Dam. SQL Server 2008 Query Performance Tuning Distilled.
- RedGate. Holger Schmeling. SQL Server Statistics.


MORE RESOURCES

White Paper – Waiting on Wait Stats

Webcast – What Are You Waiting For?

Idera Free Performance Monitoring Tool – SQL check

Idera Performance Tuning Product Trial – SQL doctor

Comments

  1. Prasanna says:

    Excellent post. Nice example. I understood the functionality of statistics, however i am very curious to know when to run this update statistics script. Because when i ran in DB it took quite long time, so i thought it would have been refreshing. But even after the run, query took lots of time to run. I was bit confused why its taking long time. could you elaborate it little more in this case?

  2. Nick says:

    Hi. That”s a good explanation.But I tested it and there”s somethings I don”t get.

    I created a table, inserted some data and created statistics.

    Then I inserted some more data and the statistics hadn”t changed even though the auto update thing was set to true.

    So why didn”t my statstics change when the new data was inserted into the table?

Speak Your Mind

*