15. 12. 2023 Rocco Pezzani NetEye, Unified Monitoring

Troubleshooting Icinga Notifications

I don’t really know the reason behind it, maybe because the typical scenario for notifications is just “send all events to this mailing list”, or as we say: set it and forget it. But we shouldn’t use this as an excuse: monitoring projects now consist of tens of thousands of objects (hosts plus services), and it’s simply impossible to just send everything to a simple mailing list: recipients would die under the weight of the email (or even worse, alerts would be ignored completely).

And alternative notification methods, like Jira issues, Ops Genie incidents, Microsoft Teams, Telegram, etc., are rapidly replacing email and for good reason. But they currently require additional configuration both on-premises and in the Cloud, so when notifications don’t work any more, it’s fundamental to understand what exactly’s at fault: NetEye/Icinga2, the Cloud, or the integration between them?

Personally, I think it’s time to provide some tools (or rather, methods) that can be used to understand if the notification source is at fault (literally: Is NetEye sending notifications the wrong way?) Don’t misunderstand me, I’m not trying to teach how to troubleshoot notifications here, I just want to point out the instruments that you can use to understand if everything from the Icinga 2 side has been configured properly. Maybe some of you already know a bit about it (or everything, even), while others might think these are too childish. But they’re still the first places to look for information when you need it.

Is NetEye Sending Notifications?

If you’re a NetEye Administrator, people will ask you: “is NetEye sending notifications?“, or “I didn’t get any email for the last issue“, or even an angry “all notifications are broken“. Those are different versions of the same underlying fact: “I (the end user) am not receiving notifications for a specific host/service“.

It’s not a general issue, nor an infrastructure malfunction: just a simple misconfiguration, and the very first step is to prove that this misconfiguration really exists. After that, we can widen the troubleshooting radius, but only after. Thus our first action is to check if an issue related to this missing notification exists.

Is There a Problem Requiring a Notification?

Notifications should only be sent in case of real issues. To ensure an issue really happened, use the history of the host/service to ensure a malfunction happened. I know this might be seen as a sign of lack of faith in the End User, but trust my experience: try to confirm everything the end user is saying in the most precise way possible. Otherwise, you’ll end up chasing ghosts.

If the event that the end user is talking about happened some time ago, you can use the Event Overview History: despite being slow, it does have a pretty complete search. It also allows time ranges in different formats, from a specific date/time, to deltas and human-readable strings. If you’re wondering what you can write in the Timestamp field, the authoritative source is https://www.php.net/manual/en/datetime.formats.php.

Now you can confirm the issue reported by the end user: if no problem is present, it isn’t a matter of the notifications, but of monitoring itself. Also, remember that notifications are only sent in case of hard state changes: if there is a problem but we’re in a soft state, then no notification will (or should) be sent. Again, it’s more of a general monitoring issue than anything else (excluding the end user being paranoid about soft states).

Has a Notification for the Issue Been Sent?

Suppose you’ve confirmed the issue reported by the end user is real. Now we can find out if Icinga 2 tried to send a notification about it (not just to the end user, but to any user). There are four possibilities:

  1. No notification has been sent: This case is easy, in History you won’t be able to find the Notification Event. The reason will be that notifications were not configured properly (or not configured at all) for the Host/Service Object.
  2. A notification should have been sent, but there is no user to receive it: This is pretty evident; the Notification Event is present, but says This notification was not sent out to any contact. This happens when the contacts in Icinga 2 were not correctly configured.
  3. A notification was sent to some contacts: Just like the previous case, you’ll find a Notification Event reporting the name of the contact that was notified, but no other details; there are two subcases:
    1. There is no Notification Event for the end user
    2. There is one or more Notification Events for the end user

Now, if you have Notification Events (case #3), you have to check whether a notification was sent to the end user. To do so, just look at all Notification events to see if there is one referring to the end user. If you prefer the easiest way, you can go to Notifications History and perform the same search: here, you can filter by the Notification Contact name to narrow the search even further:

As some friendly advice, don’t try to use Contacts from the Overview menu: it’s really slow and doesn’t have filtering capabilities. But let’s see what we can do case by case:

Case 1: Ensure Notification Apply Rules exists and targets this specific host/service. If these rules exist, ensure the required States and Transitions are listed in them. If not, create them.

Case 2: There is at least one Notification Apply Rule targeting this specific host/service. Find it and look for the associated User Objects: they are likely missing some States or Transitions.

Case 3-A: There is at least one Notification Apply Rule targeting this specific host/service, but the end user is not listed in it. If possible, add the end user to it. If not, create a new Notification Apply Rule with the end user as the receiver.

Case 3-B: The issue is not inside Icinga 2. Maybe it’s on the NetEye Server side, but definitely not in Icinga 2, since it is trying to send Notifications for the requested event to the end user.

Just to be sure we’re on the same page, Notifications and Users related objects can be modified using Icinga Director.

Am I Missing Something in My Notifications?

Sometimes it happens that a Notification event is there, but the end user still doesn’t receive any messages. Or, multiple notifications are sent to the same User. Or simply there’s something that doesn’t feel right about how you configured notifications (timing, repetitions, and so on). In these cases, there’s nothing more useful than inspecting the Running configuration.

First though, a bit of theory. Everyone talks about notifications as if you’re able to directly configure them through Director. This is wrong. Or better, this is not exactly what happens. Strictly speaking, what you configure in Director is not responsible for sending notifications. With Director, you configure a Notification Apply Rule. Please note this: it’s an Apply Rule. This means it’s not the actual object.

What Icinga 2 does is use the Notification Apply Rule to actually instantiate a Notification Object for each host/service it targets. Notification Objects are responsible for checking upon the Object they watch over and sending Notifications accordingly. Since a Notification Object is attached to a host/service, its name follows one of these two patterns:

  • If attached to Host Objects:
    • <Host Object Name>!<Notification Apply Rule Name>
    • <Host Object Name>: <Notification Apply Rule Name>
  • If attached to Service Objects:
    • <Host Object Name>!<Service Object Description>!<Notification Apply Rule Name>
    • <Host Object Name>: <Service Object Description>: <Notification Apply Rule Name>

Now, to understand what really happens you have to get all Notification Objects for the host/service of interest and check their configuration. The easiest way to do that is to go through Director; following these steps:

  1. Open Director
  2. Go through: Icinga Infrastructure -> Endpoints
  3. In the list of Endpoints, pick the Master Endpoint (it’s the one in the Master Zone, with the weird symbol to the right of its name)

If you open the Inspect tab, you’ll get a Tree, where each leaf has a corresponding Object Type in your Icinga 2 Instance. You can dig through this page as much as you want to see all objects in your Runtime, but be aware:

  1. The more objects you have, the slower it will be, so be patient
  2. This view is very technical: you can expect to see definitions of Objects you use everyday, like Hosts, Services or Notifications; other Object types require that you carefully read the Icinga 2 Guide

Now, if you click on the Notification item from the tree, you’ll get a list of all Notification Objects currently active in your Icinga 2 Instance.

The list isn’t practical: it’s not sorted and is usually beyond huge. A suggestion: use your browser’s search function.

Using the naming convention reported above, you can easily understand if the Objects you are interested in have the Notification Objects you hope for, or if there are too many. Just pay attention to the names.

If you click on any item reported, you’ll see its full runtime configuration, including all details.

Everything is Correct, But Notifications are Still Not Sent

The configuration from Director is correct and Icinga 2 creates the right objects, but no notification is sent. Is this still possible? Yes. Does it mean there’s something wrong with the Message Infrastructure (Jira, OpsGenie, Ms Teams, Telegram etc.)? Don’t jump to that conclusion yet.

The last thing we should do to ensure the fault is not within NetEye is to ensure Icinga 2 is correctly engaging the Message Infrastructure. What am I saying? Simply that notifications are managed just like monitoring is. What Icinga 2 does is simply execute Plugins: Monitoring Plugins for monitoring, and Notification Plugins for notifications. Everything is subject to the same rule: you need a Notification Command, and the Plugin must execute flawlessly.

How can we be so certain? There’s no Inspect like there is on the monitoring side. So, the only possible way is to search the Icinga 2 log. If there’s an issue with the Notification Plugins, it will be reported, including the full command line Icinga 2 used to try to invoke it. Just grep the name of the host/service in the file /neteye/shared/icinga2/data/log/icinga2/icinga2.log, or just PluginNotificationTask if you’re not sure.

Next Steps

This is what you should do to ensure NetEye is not the cause of your not-sending-notifications-to-someone. If everything is right and still no notification are sent, you’ll have to move more like a System Administrator: you have to ensure all integrations configured on NetEye are working, from the easiest configuration of Postfix (about email) to other helper services (like OpsGenie Edge Connector and so on), and ensure the dialogue between the integration and the Message Infrastructure is working.

This is not strictly the work of a NetEye Administrator, but rather your trusted System Administrator.

Author

Rocco Pezzani

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive