Page Life Expectancy as a metric has been discussed in SQL Server for quite a long time now and it is still considered a key performance indicator for those DBAs looking at the overall health of their database instance. There is also a lot of ambiguity around this metric and my hope is to provide the reader, especially those new to the world of SQL Server, a short overview of the metric and how it can be investigated.
Here is a link to the MSDN definition of the Buffer Manager object which provides counters on how SQL Servers uses memory to store data pages, physical I/O for reads/writes amongst other things.
Page Life Expectancy (PLE) is a measure of the duration (in seconds) a data file page is expected to stay in SQL Server’s buffer pool. Please keep in mind that this metric is not an aggregate or accumulation, but simply a point in time value which DBAs will query out of the above mention performance catalog. SQL Server will always look to see if data is in memory first and if not, it will have to go to disk and retrieve the data that it needs to complete the requested transaction.
Traditionally, the Page Life Expectancy value most DBAs and pre-packaged monitoring solutions would aim for is 300 seconds or more. This means a data page will remain in memory for at least five minutes. The value of 300 seconds came from as a recommendation from Microsoft in the mid-2000s and the general consensus amongst SQL Server DBAs nowadays is that aiming for a set value isn’t recommended. The 300-second threshold itself comes from the days when memory was limited. Now it is quite common to see SQL Server running on systems that have TBs of RAM at its disposal. There is really no set PLE value that can be recommended. What should be done is continuous monitoring on what the PLE trend is and remain proactive when it drops and stays below the typical value for the SQL Server instance in question.
Troubleshooting Low Page Life Expectancy
If Page Life Expectancy drops from the typical value range and stays there, this definitely means there is increased demand and pressure on the buffer pool.
Can your SQL Server instance benefit from additional memory? Perhaps. One thing to look for a steady and consistent increase in workload while the PLE is trending downward. If that is the case, it is probably time to increase the memory allocation. If the workload hasn’t changed, but there is increased demand on the Buffer Pool, it could be that outliers that are using more memory.
Are there maintenance jobs running?
Update Stats or DBCC operations?
These types of outlier operations can cause large amounts of reads and could be the root cause as to the drop in Page Life Expectancy. Quite commonly, it could be that a query plan has changed, perhaps from stale statistics, and is now performing an expensive operation causing the added demand on the Buffer Pool.
There are many ways to monitor the PLE metric over time (trend analysis) as well as the queries that are transacting causing a large amount of reads. The tried and true use of DMVs and Extended Events in SQL Server have been instrumental in this process of gathering data, but are manual and time-consuming in nature while offering limited benefits when it comes to gaining a historical perspective on metric performance over time. There are also solutions such as Quest’s Spotlight Cloud which can provide the out-of-the-box ability to track Page Life Expectancy over time but analyze workload to identify which queries and/or outlier activities are causing pressure on the Buffer Pool.