For database administrators in charge of responding to SQL Server alerts at all hours of the day and night, the feeling of being overloaded is likely exacerbated by the constant barrage of notifications that something needs your attention. RIGHT. NOW.
SQL Server monitoring is crucial to maintaining high availability and tracking performance issues in your system, and alerts are hands-down the most efficient way to find out there is an issue. But it is possible to have too much of a good thing.
As the saying goes, “When everything is a priority, nothing is a priority.” Alert fatigue is real and can lead to you ignoring or dismissing events that are negatively affecting your users.
When you set up your SQL Server performance monitoring, it’s important to configure alarms mindfully and in a way that controls when, why, and how often you receive notifications. Here are four ways to manage alerts that will help alleviate alert overload and save what’s left of your sanity.
1. Turn Off the Alarms You Don’t Need
For a lot of DBAs, this is easier said than done. There is a small element of terror at the thought of choosing which alerts not to receive. Fortunately, there are some best practices you can implement that can make your FOMO a bit less painful.
One of the easiest things you can do is review the alert logs and shut off alerts that are chronically false alarms or false positives. Odds are good that you won’t miss a real issue, and your brain will appreciate the break from reacting to unnecessary notifications.
Another strategy comes from Google’s site reliability engineers (SREs). SREs are in charge of availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
The SRE teams have an Alert/Ticket/Log system in place to minimize alert overload by assigning a response to an event that is based on how quickly human intervention is required. The three possible responses include:
- Alert: An alert is only sent if a person must immediately take action.
- Ticket: If the event requires action by a person, but it can wait until normal business hours, a ticket is submitted and goes through the normal channels.
- Log: If no action is required, the event is logged for diagnostics.
2. Use Smart Alarms to Quickly Get to the Root Cause of an Alert
When your phone blows up with notifications at 3 a.m., you don’t want to spend an hour poking around to fix the problem.
Smart Alarms not only tell you you have a problem but also suggest ways to fix it and help you identify the root cause. Smart Alarms also provide historical data about the event so you know what happened immediately before and after the alert was triggered.
3. Prioritize Your Alerts to Identify the Most Urgent Issues
All alerts are not created equal, so it’s important to configure your SQL Server performance monitoring tool so that it only sends alerts for the most important issues. By prioritizing alerts based on severity level, impact to the business or customers, and whether immediate action is required, you eliminate some of the noise generated by alerts that aren’t critical.
Focus on setting up alerts for issues that can cause your servers to go offline, severely corrupt data, or result in significant data loss (i.e., Severity 17 or higher and error messages 823, 824, and 825).
4. Manage Alarms by Applying Specific Thresholds and Rules
Setting thresholds and rules is a huge sanity saver because it will help you avoid being bombarded by multiple alerts in a short span of time.
When you define performance thresholds, SQL Server holds off on notifying you until a value for a specified metric reaches a concerning level—for example, free disk space or free physical memory levels are dangerously low. This frees up DBAs to work on other tasks without constantly monitoring metrics.
Setting rules for alerts lets you customize actions, such as how frequently you want to be notified. For example, you could set SQL Server to only send a notification when a specified alert has been triggered four times or if the alert contains a certain database object or user name.
As DBAs begin to navigate a new and very different business environment post-COVID-19, stress levels are sure to rise. Maintaining high availability and ensuring your SQL Server systems are secure and performing optimally will remain a big priority. But now is a good time to enlist SQL Server monitoring capabilities to take control of your alert configurations and get rid of the unnecessary noise.