Before you start to look at a SQL server monitoring tool, think about your specific environment:
- How many instances do you want to monitor?
- Are these in one location or dispersed?
- Do you need to monitor the operating system and/or hypervisor?
- How much history do you need to get an accurate picture of the boundaries of operation of your instance?
- Are they all on-premises or are some in the cloud?
- Are your teams distributed?
- Do you purchase software under a capital expenditure or operational expenditure budget?
- Can you afford to invest a lump sum up front on infrastructure and license or do you prefer to spread your costs over time?
- Do you have infrastructure and database instances available to dedicate to a monitoring tool?
- Do you have time to build a monitoring infrastructure?
- Do you have consistently high expertise across your team?
- Do you utilize junior resources for initial triage or depend on your experts for everything?
- Do you have time or resources internally to maintain monitoring infrastructure?
Should I Roll My Own?
I can declare our bias here. Quest Software has been building performance monitoring tools for the last 20 years. There is an excellent reason why we and many others like us have remained in this segment for so long and why we have a growing customer base. Performance monitoring done well ain't easy!
There are indeed some great ways to gather metrics from SQL Server using PerfMon, traces, DMVs, and XEvents, to mention a few. Doing this on a one-off basis for a single issue is well and good—if you have the time to invest in researching where and how to gather the data for that issue. Once the issues begin to mount up and the numbers of instances increase, this rapidly becomes unscalable.
There are several hundred metrics available that are worth tracking to get a complete picture of your SQL Server’s performance health. In addition to that, there is the SQL code that runs and the query plans associated with each execution of the same. Some metrics should be collected every second, some every hour, and some based on when the code executes. Some collection methods can impact the monitored instance and should be avoided.
Each metric will have different thresholds which would define its status. Particular instances may have levels that are nonstandard. Then you have to store all of this. The volume of data adds up VERY fast. You will need to put a strategy in place to purge detailed data on a regular basis and then, if required for trending, aggregate this data for reporting.
It is a LOT of work … and of course, every time a new SQL Server version comes out, you have a regression headache to deal with. Unless you actually want to sell a monitoring tool, I would strongly advise against rolling your own unless the volume of issues is low and the problems you have to solve are very specific.
What About Free Tools?
Free tools are often worth considering, especially for smaller teams with less critical instances. Think of it as the next step in the ladder of scalability after “roll your own.” Many of the entry-level commercial SQL server monitoring tools should have similar considerations. Consider the following:
- Does the tool cover a sufficient breadth of metrics to give you adequate coverage for all use cases across your monitored instances? Many free tools will give some sort of “customization” to add your own metrics. When “customization” is being used to fill gaps in functionality, then you will quickly find your team ending up “rolling their own” with the requisite distraction and maintenance headache.
- Does the tool support alerting? Is it preconfigured? Configuring alerting can be very time-consuming. Out-of-box alerting is a must to prevent many lost man-hours configuring someone else's tool. It should also facilitate the customization of alerts for the edge cases that do not conform to defaults.
- How and where is the data stored? Most free tools leave it to you to manage the storage of performance data. Be wary of “free” monitoring from cloud vendors. They do charge for the storage, and this can get big and expensive fast!
So, by all means, do exploit the free tools that are out there. Just be aware of their limitations and look out for the classic anti-patterns within your team, such as:
- More time spent fixing or maintaining the tool that using it to fix problems
- More money spent on infrastructure and storage
- Lots of data but no insights
- Not enough depth in diagnostics to solve issues
- Not scalable enough to suit your needs
If you notice any of the above, it should point to the need to upgrade to a more robust solution.
Typical Architecture of a SQL Server Monitoring System
When considering whether to go for a traditional on-premises solution or a hosted software as a service (SaaS) solution, it is helpful to consider the monitoring application architecture. Here is a summary of the key architectural components.
The key deviation between SaaS and on-premises relates to where the performance data is stored and who manages this repository. For an on-premises solution, this is the responsibility of the end user. These repositories can get big fast, so they need to be carefully managed. The infrastructure needs to be planned and budgeted for (more below).
In a SaaS solution for SQL server monitoring, these key infrastructure components are hosted and managed for you.
|Traditional On-Premises Solution
For more details, check out our blog, Database Monitoring System Architecture.