Downtimes are an important part for the correct creation and interpretation of an SLA report. While downtimes and status changes can be scheduled, unexpected changes need to be retroactively fixed or sanctioned.
Let me give you a practical example to better explain what I mean:
Think about an ISP who defined a certain SLA with his customer, let’s say an availability of 99.8%. The ISP is sure to provide an excellent service to his customer and always reaches the agreed availability. At a certain point in time, the internet connection of the customer is interrupted and the availability falls under the defined SLA. But the interruption is not caused by a bad service of the ISP, but by some third party activities as for example road works, which accidentally cut a cable. In this case, the interruption should not influence the calculation of the SLA compliance, because the interruption was not the ISPs fault. To give most transparency to its customer, the ISP should have the possibility to exclude the occurred downtime from the SLA report and to add some written information for future tractability.
Our approach for problems of this kind is not to correct just the SLA report, but to correct the log entries themselves and to afterwards calculate the SLA report form the “correct” logs.
The Event Correction is divided into two different parts. The creation of the Event Correction and its application to the logs.
Creating an Event Correction requires relevant information like:
Host
Service (optional)
Backend
Corrected Status
Start Date
End Date
The idea is to define a period, during which the current state is replaced with a new state, or a downtime is subsequently defined or removed.
Event Corrections can be created using the dedicated plug-in, or by using the links on the page to calculate the availability for Hosts/Service in thruk.
Once the Event Corrections exists, it is inserted into the log by adding new entries to it, or by replacing incorrect entries while maintaining the correct log structure. In this way, reports can be generated based on the original as well as on the corrected data.
Please note that in order to be able to view or manage such event corrections it is necessary to define the corresponding settings in the user profile.
After getting my Bachelor degree in Computer Science I applied for a position by Wuerth Phoenix and continue to acquire my Master in Software Engineering while working.
Author
Lukas Franceschini
After getting my Bachelor degree in Computer Science I applied for a position by Wuerth Phoenix and continue to acquire my Master in Software Engineering while working.
Sometimes you get a higher network latency during certain periods of the day. [caption id="attachment_18636" align="alignnone" width="1024"] Network section of a datacenter (1 Gigabit Ethernet) with normal, constant latency throughout the day. Please consider that the typical latency for 1 Read More
A front end developer is positioned in the middle of lots of other jobs. To successfully deliver products he has to be aware of design, back end, content and other things that make everything nicely work together [2]. Misunderstandings, wasted Read More
Everyone who is administering an IBM AS400 server has also the responsibility to monitor it. If the used monitoring software is based on a Nagios solution like our NeteEye, the best known monitoring plugin for as400 is the check_as400. The Read More