30. 06. 2025 Csaba Remenar NetEye, Unified Monitoring

Alertmanager Alert Filtering Rules – Effective Alert Management in Practice

Lately, we’ve been working extensively on integrating Prometheus alerts into NetEye. In most cases, we process the alerts sent by Alertmanager (Prometheus’ alert management system) using Tornado, where they are grouped based on labels – not following the traditional “Host” or “Service” structure.

Our task is to “translate” these alerts into “hosts” and their corresponding “services” using Tornado, often breaking them down into individual cluster elements. Without delving too deeply into this topic –which could perhaps be the subject of another blog post – it’s clear even from this brief explanation that this process can sometimes be far from straightforward.

Of course in most cases, Tornado’s filtering, rule, and action management system can handle this flawlessly. However, the technical aspect isn’t the only important factor; we need to consider others as well. One such example is alert noise.

The Problem of Alert Noise

In large-scale and complex infrastructures, “alert noise” can become an increasingly serious issue. Concurrent alerts and notifications may overwhelm operations teams, reducing efficiency, increasing the number of missed or improperly handled incidents, and ultimately compromising system reliability.

The foundation of effective alert management lies in intelligent filtering, grouping, and routing of notifications. In the NetEye system, multiple approaches exist to mitigate alert noise – for example, precise event filtering with Tornado rules, delaying or escalating notifications, or even temporarily suppressing them (via the Downtime or Acknowledge features).

But for now, let’s explore what we can do in Alertmanager to prevent unnecessary alerts in the first place.

Alertmanager

Prometheus is an industry-standard metrics collection tool, with Alertmanager serving as its alert management component. Alertmanager plays a central role in handling notifications, enabling their unified processing and routing to various channels like email, Slack, or in our case, NetEye – which allows alerts to be managed centrally and in a controlled manner.

While NetEye’s Event Manager (Tornado) also offers filtering capabilities for incoming events, the most efficient approach is to prevent unnecessary alerts at the Alertmanager level. Through Alertmanager’s configuration, you can apply various filters and rules, such as:

Deduplication (merging duplicate alerts)
Grouping (aggregating related alerts)
Priority-based routing

All of these features effectively help reduce alert noise.

Filtering Alerts in Alertmanager

Grouping

Let’s start with the first and simplest method: grouping. One of Alertmanager’s core features is its ability to group alerts (group_by, which consolidates alerts along with items with similar labels into a single notification).

For example, with a simple configuration, we can group alerts sharing the same name:

route:
  # default, if there's no match anywhere
  receiver: blackhole
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 2m
  repeat_interval: 12h
  routes:
    - matchers:
        - severity=~'critical|warning'
      receiver: "NetEye"
      group_by: ['host']

In the example above, we filter alerts based on the severity label, ensuring only “critical” and “warning” notifications are forwarded to NetEye. Additionally, we group them by the host label.

While this setup helps, it’s not a complete solution. Alertmanager sends a single notification per group due to grouping – but this notification still contains all individual alert instances belonging to the group. If the receiving system (like Tornado) processes these instances separately, multiple events will still be generated despite the grouping.

In most cases, this is the expected behavior and is not an issue. However, in certain scenarios, it can still result in unnecessary alert volume.

The “Silence” Feature

Alertmanager’s Silence function allows you to temporarily mute (disable) alerts – for example, during maintenance or for known issues – preventing unnecessary notifications. It works similarly to Downtime or Acknowledge in NetEye, but here, alerts are filtered using their labels, which can even include regular expressions. Each Silence has a defined start and end time and can be configured via Alertmanager’s web interface or API.

How It Works

When Alertmanager receives alerts from Prometheus, it checks if they match any active Silence rule. If they do, no notification is sent. Note that the alert itself remains active in the system – only the notification is suppressed.

Example: Creating a Silence via API

The following HTTP POST request creates a Silence that mutes all alerts where:

alertname=”HighCPUUsage”
instance matches the regex “server[12].example.com”

curl -XPOST http://alertmanager:9093/api/v2/silences -H 'Content-Type: application/json' -d '{
  "matchers": [
    {
      "name": "alertname",
      "value": "HighCPUUsage",
      "isRegex": false
    },
    {
      "name": "instance",
      "value": "server[12].example.com",
      "isRegex": true
    }
  ],
  "startsAt": "2025-06-29T12:00:00Z",
  "endsAt": "2025-06-29T16:00:00Z",
  "createdBy": "admin",
  "comment": "Muted during maintenance"
}'

Key Fields:

matchers: Defines the labels used to mute alerts (supports regex).
startsAt / endsAt: Must be in ISO8601 format (UTC).
comment: Important for documentation.

Unlike grouping (which consolidates alerts) or outright dropping them, Silence temporarily suppresses notifications while keeping the alerts active in the system.

Inhibition Rules

The third method for managing notifications in Alertmanager is inhibition rules. These configuration directives allow certain alerts to be suppressed if other related alerts are already active. The principle is simple: if a higher-priority alert (indicating the root cause) is already active, there’s no need to send lower-severity alerts for the same issue.

Key Differences from Silence

Unlike the Silence feature (which mutes alerts for a fixed duration), inhibition rules do not require a time interval – they work dynamically based on the alert hierarchy, and automatically suppress less critical alerts when a higher priority one is firing, focusing on logical relationships between alerts rather than on manual muting.

How It Works

Inhibition rules suppress less important alerts if an ongoing higher-impact issue has already been reported.

The main components of the rules are:

source_matchers: Defines the “source” alert (the one that suppresses others)
target_matchers: Defines the “target” alert(s) to be suppressed (supports regex)
equal: Labels that must match for the inhibition rule to apply (e.g., host, cluster)

Its use can therefore be logical and understandable under all circumstances, and it can play a key role in reducing “alert noise”; furthermore, in the case of the NetEye + Tornado pair, this is especially important. Let me demonstrate with a simple example.

Simple Example

As mentioned earlier, we typically use Tornado to break down the array into its elements, creating or modifying the host and service statuses accordingly.

Let’s assume we have three alerting rules with the same name (“WindowsMemUsage”) but different severity and priority labels (avoiding redundant service names like “WindowsMemUsageWarning” or “WindowsMemUsageCritical”):

Warning status: Triggers if memory usage exceeds 92% for 15 minutes (labels: host=server1, severity=warning, priority=medium)
Critical status (long-term): Triggers if memory usage exceeds 92% for 25 minutes (labels: host=server1, severity=critical, priority=high)
Critical status (immediate): Triggers if memory usage exceeds 96% for 5 minutes (labels: host=server1, severity=critical, priority=emerge)

In summary: the first rule generates a “warning” status after 15 minutes, the second triggers a “critical” status after 25 minutes, and the third immediately triggers a “critical” status if memory usage exceeds 96% for 5 minutes. Without inhibition rules, multiple alerts may be active for the same issue at the same time, which is not only unnecessary but can also cause confusion in the final statuses.

It’s important to note that Alertmanager does not order notifications randomly, but based on state transitions and alphabetical order. For example, if all three rules are active and no suppression is applied, the following notification groups may also be possible:

Group 1.:
[
{alertname: “WindowsMemUsage”,host: "server1", severity: “critical”, priority: “emerge”, status: “firing”},
{alertname: “WindowsMemUsage”,host: "server1", severity: “critical”, priority: “high”, status: “firing”},
{alertname: “WindowsMemUsage”,host: "server1", severity: “warning”, priority: “medium”, status: “firing”}
]

Group 2.:
[
{alertname: “WindowsMemUsage”,host: "server2", severity: “critical”, priority: “emerge”, status: “resolved”},
{alertname: “WindowsMemUsage”,host: "server2", severity: “critical”, priority: “high”, status: “firing”},
{alertname: “WindowsMemUsage”,host: "server2", severity: “warning”, priority: “medium”, status: “firing”}
]

In example “Group 1.”, among three alerts with identical “firing” status, alphabetical order will determine precedence. As a result, during processing by Tornado, not only are three events unnecessarily processed, but the final status may incorrectly be “warning”, even though the situation is still “critical”.

In example “Group 2.”, a “resolved” status alert is also present. These “resolved” alarms always appear at the beginning of the group in the array (alphabetically if there are more than one), followed by the “firing” alarms, also in alphabetical order. So in the example, the first will always be the “resolved” one, which is fine; however, among two alerts with identical “firing” status, alphabetical order will again determine precedence. This may again result in an incorrect final status of “warning.”

The ideal behavior should always be that the alert with higher priority suppresses the lower-priority one. We can achieve this in Alertmanager for the above scenarios using the following inhibition rules:

inhibit_rules:
 
  - source_matchers:
    - alertname = "WindowsMemUsage"
    - priority = "high"
    target_matchers:
    - alertname = "WindowsMemUsage"
    - priority = "medium"
    equal: [host]
 
  - source_matchers:
    - alertname = "WindowsMemUsage"
    - priority = "emerge"
    target_matchers:
    - alertname = "WindowsMemUsage"
    - priority =~ "high|medium"
    equal: [host]

With these rules, which must be placed directly in Alertmanager’s configuration file, when the “high” priority alert is active then the “medium” one is suppressed. If the “emerge” priority comes into effect, it silences both the “high” and “medium” level alerts. This ensures that only the most important alert is ever sent. If we revisit the previous examples, our array will now look as follows:

Group 1.
[
{alertname: “WindowsMemUsage”,host: "server1", severity: “critical”, priority: “emerge”, status: “firing”},
]

Group 2.
[
{alertname: “WindowsMemUsage”,host: "server2", severity: “critical”, priority: “emerge”, status: “resolved”},
{alertname: “WindowsMemUsage”,host: "server2", severity: “critical”, priority: “high”, status: “firing”},
]

It’s evident that the “unnecessary” notifications have been suppressed and do not appear in the array, so they will not cause problems during processing. Note that in both cases, the “resolved” notification is sent out; this cannot be suppressed, but since the “firing” rule always follows it in the array, the final status will correctly be “critical.”

Summary

Alertmanager’s filtering rules are essential tools in professional alert management. When configured correctly, they can significantly reduce alarm noise, ensuring that only the most important alerts reach operators – which is especially important in the case of the NetEye + Tornado pair. With proper settings, the focus can be shifted to critical issues, highlighting alerts that require immediate attention. This improves troubleshooting efficiency and speeds up problem identification.

Useful links:

These Solutions are Engineered by Humans

Did you find this article interesting? Does it match your skill set? Our customers often present us with problems that need customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.