The new challenge for monitoring solutions is to monitor infrastructure, software, and platforms that run in the cloud, or that are outsourced.
The various contract models with cloud providers/outsourcers no longer focus on infrastructure monitoring, such as monitoring the fans or power supply in a physical server, but rather the availability and performance of applications, databases, services, and more.
These metrics are often defined in contracts as part of the goal to be achieved.
Thus IT departments need to apply new requirements within their monitoring solution.
These requirements are to measure the defined reach (availability and performance) of the outsourced applications, services, etc. and present it in a meaningful report.
Recently, I have been increasingly confronted with such scenarios and been asked to prepare some proposed solutions.
In the meantime, I have come to the conclusion that the following model or monitoring concept best meets these requirements.
I would represent the model in two levels.
In the first level I define three blocks:
On the second level there should be a system that collects the information from these three blocks, and from that create reports, SLA calculations, and alerts.
In the field of APM, of course, there are several extensions which can be implemented, e.g. Database tracing, collecting performance metrics from web applications, etc. It also means that different tools can be used depending on the needs, supplier or manufacturer.
The use of an APM solution is important for analyzing performance issues in your applications.
Often, an end-to-end monitoring solution is deployed in the area of APM. In my opinion, end-to-end monitoring in a outsourced or cloud environment should be seen as a separate pillar, and therefore its importance should be emphasized.
Just this kind of end-to-end monitoring solution should already be begun before a cloud migration in order to be able to determine whether there were any performance or functional improvements or deteriorations after completion of the cloud migration.
In addition, an end-to-end monitoring solution can often be the only way to monitor and evaluate the defined SLA’s of services.
The use of a Log Management solution with SIEM makes it possible to collect logs from the various services, applications and servers, and to carry out and present any necessary correlations. In the first instance, error detection may be less important than covering the various auditing requirements and implementing the requirements of a security operations center.
Finally, all of these functions must provide their reporting and SLA information. The reports on the defined SLAs should be created automatically on a regular basis and sent to the required groups of people. At the same time, it should also be possible to quickly create individual reports.
To simplify the management of all alarms, the entire alerting mechanism should be controlled by the system at the second level.
Finally, I would like to say that this central collection point of information is being used by many to provide control of their cloud systems and/or their outsourcing costs.
I know that explaining all these themes can be quite long and difficult, but with this blog I tried to summarize the most important steps and components for successful cloud monitoring.