The SQL Server Page Life Expectancy (PLE) metric has long been considered a key performance indicator for DBAs looking at the overall health of their database instances. PLE shows whether the system is under internal memory pressure using counters provided by the Buffer Manager object.
A Closer Look at Page Life Expectancy
PLE is a measure of the length of time (in seconds) a data file page is expected to stay in SQL Server’s buffer pool. This metric is not an aggregate or accumulation, but simply a point-in-time value that DBAs will query out of the Buffer Manager.
SQL Server only reads data pages from the buffer pool (i.e., logical read), so if the page isn’t in the buffer pool, it finds it on the disk (i.e., physical read) and moves the page to the buffer pool so it can do a logical read. This is a time-consuming process and can negatively affect performance.
What Is a “Good” PLE Value?
A high PLE value means a page is staying in the buffer pool longer, so SQL Server is less likely to have to go to disk looking for the data page, which makes the system run faster.
Historically, DBAs considered 300 seconds (five minutes) the PLE sweet spot. However, that number is fairly arbitrary. Microsoft recommended 300 as the PLE standard back in the 2000s when memory was limited.
Today, DBAs don’t focus on a “right” number because buckets of memory come standard on most systems. It’s not unusual for SQL Server to run on a system that has TBs of RAM at its disposal, so DBAs have adopted a formulaic approach to identifying a “good” PLE value:
Page Life Expectancy = 300 seconds for every 4 GB of RAM on your server
However, it’s arguably more important to continuously monitor PLE values for changes in consistency so you can identify memory problems and resolve them quickly.
If you work with a large volume of data, it’s important to note that bigger servers often have multiple PLEs. Each non-uniform memory access (NUMA) node gets its own PLE value, and then those numbers are calculated to get the server’s PLE value. For example, take the PLE value of the node x 1,000 (do this for all the NUMA nodes). Add the values of all nodes, then divide by the total number of NUMA nodes, then divide again by 1,000. This will give you the server PLE.
How to Determine Whether There Is a Problem with Page Life Expectancy
Fluctuations in PLE are normal because it is based on workload. Tracking high, average, and low trends can show you whether certain processes, like table scans or flushing the buffer cache, need to be tuned to improve PLE.
A good way to determine if there is a problem is if the normal PLE value range drops and stays low. This indicates that there is likely increased demand and pressure on the buffer pool.
Does this mean that you need to throw some more memory at the problem? Maybe. Maybe not.
Troubleshooting Low SQL Server Page Life Expectancy
There are several reasons why PLE values may be trending low. It’s important to troubleshoot the problem because the solution isn’t the same for every root cause. Here are three of the culprits most likely to be slowing down your PLE:
If workload is steadily increasing and PLE is decreasing, you are likely short on memory. Adding memory might help increase PLE, but it won’t make queries run more efficiently.
If the workload hasn’t changed, but there is increased demand on the buffer pool, it could be that outliers are using more memory. Check to see if there are maintenance jobs running or index rebuilds in progress.
Stale statistics can cause changes to the query plan. This increases demand on the buffer pool by causing expensive operations to run because they aren’t synced with new stats.
How to Fix Low Page Life Expectancy by Optimizing Queries
The best way to fix low PLE values is by going to the source and optimizing your SQL Server queries. This comes with an additional bonus because optimizing queries will improve the overall performance of your system at the same time.
There are several things you will want to do that will help you optimize queries for maximum improvement of PLE:
- Drop unused indexes
- Merge duplicate indexes
- Look for large queries
- Know what’s in the buffer pool
- Defrag indexes
- Update statistics
- Purge data
Tracking Page Life Expectancy Over Time
Although PLE is a point-in-time metric, looking at PLE over time is an important way to identify issues early and fix them quickly before performance is significantly impacted.
There are many ways to monitor the PLE metric over time and identify the queries whose transactions cause a large amount of reads. DMVs and extended events in SQL Server are the tried-and-true methods and have been instrumental in this process of gathering data. But they are also manual and time-consuming, and they offer limited benefits when it comes to gaining a historical perspective on metric performance over time.
A commercial solution like Spotlight Cloud not only provides DBAs with the ability to track PLE over time straight out of the box, but it also analyzes workload to identify which queries and outlier activities are causing pressure on the buffer pool so you can isolate and remediate the problem and optimize your SQL Server performance.
Originally published April 2019 and updated September 2020.