31. 03. 2026 Rocco Pezzani Log-SIEM, NetEye, Unified Monitoring

Massive Near-Real-Time Monitoring with NetEye

Hello to you all. It’s been a while. Don’t worry though, this won’t be a long and technical post. It’s just to let you know I’m doing (almost) well and to tell you about our latest news.

The Metrics Challenge

In the last year we’ve had a lot on our plate, but this hasn’t affected our capacity to design and implement new solutions with NetEye. Recently we were tasked with solving a new quest: use real-time metrics to implement near-real-time monitoring.

My first thought was “Ok, let’s do this the usual way: store metrics inside InfluxDB, then use NEP InfluxDB Query to process performance data“. But when I asked “How many devices are we talking about?“, they responded with “The whole Infrastructure“. That’s when I realized that we weren’t talking about one or two hundreds metrics per minute, but around ten thousand metrics per second, or perhaps even more.

At that moment I understood the meaning of “deafening silence”: Everyone in the room froze up for I-don’t-know-how-many seconds; then I collected myself off the floor and said “Give me some time to think about it“.

The fact is that some years ago, while talking with our Division Head about the pros and cons of Metric-based Monitoring I was adamant in saying that this is the best solution for monitoring and we absolutely must do it. In that room, the chance to prove my statement dropped straight into my lap, so my pride forced me to at least present a possible solution. I had no idea of the challenge ahead.

Thinking of a Solution

It was an unheard-of thing. I mean, up until now, everyone had been avoiding this topic. Using metrics to perform near-real-time monitoring with a plugin-driven system (i.e., Icinga 2) is almost suicidal: you have to query InfluxDB one or more times every minute for each Service Object that you have to monitor.

And this cannot be delegated to Satellites because data is centralized on NetEye (Single Node or Cluster). This generates a very high level of stress on the monitoring infrastructure and it’s not able to scale quickly enough. In fact, this approach doesn’t scale at all, meaning you will kill your NetEye Cluster and not get anything to show for it.

We quickly started looking for a solution. The first idea was using an approach that in fact already runs on NetEye Cloud to check the status of all Elastic Agents and their data sources: use a Poller to get all metrics in one swoop, then set all related services via Tornado. This can definitely improve scalability, but the data backend is still InfluxDB: in a NetEye Cluster InfluxDB has High Availability support, but this only applies to the data. InfluxDB itself is still a single-instance service, meaning that this setup cannot scale at all.

Using more InfluxDB Instances can improve scalability, but the resulting infrastructure will become a nightmare to manage, load balancing would be “manual” (no dynamic relocation of data between instances) and still there is no real High Availability. Therefore, we needed to switch to another architecture.

An Elastic Solution

As I mentioned before, on NetEye Cloud we already use a promising strategy: use One Big Query to get all data for the same class of services, process it and set related Services Status using Tornado. We were therefore able to set several thousand services using just a handful of Elasticsearch queries.

Since Elasticsearch is designed to efficiently handle lots of data per query, the whole workload decreased; also, Icinga and Tornado kept the pace by updating the status of several thousand services per minute, resulting in great performance and scalability improvements.

So, why not apply this approach to metrics as well? Suppose we’re receiving 10 metrics per host every minute. Then with 60K different hosts sending metrics we’re getting a flow of 10K EPS. That’s nothing for Elasticsearch.

And we would still achieve a resolution higher than a standard plugin-based monitoring approach (i.e., a poll every 3 minutes). So, why not invest in a “sophisticated” poller script to elaborate the metrics? Based on how many classes of metrics you need to analyze, each class requires just one dedicated query.

So, we would need to develop from 30 to 50 different queries: a perfectly doable request. And an Elasticsearch Cluster can scale horizontally with ease, easily increasing EPS ingress rate and storage size. This made me believe that Elasticsearch (and the full Elastic Stack) could be a solution to the Metrics Challenge.

While searching for more details, we found out that Elasticsearch was in the process of releasing support for TimeSeries Data Indices, promising improved performance and reduction of storage consumption. This was the last shoe to fall. I immediately set out to design a workable architecture, since I felt I had a promising solution at hand.

Conclusion

Now, I don’t want to spoil the rest of it. The fact is that, after a year of work, we are now ready to say that NetEye can efficiently perform Metric-based Near-Real-Time Monitoring. We were able to fuse together Elasticsearch, Tornado, Icinga 2 and passive monitoring strategies to implement metric-based monitoring that’s flexible, scalable and robust, which can tell you if something is going wrong, and can even report to you if some data in your gigantic Flow of Metrics is no longer arriving.

And we built it over a fairly standard NetEye deployment (a “common” 3-way cluster). And, on top of that, it integrates so well with NetEye that, even if it can be considered a heavy and pervasive customization, it doesn’t impact on update/upgrade procedures at all.

This has yet to be integrated in our main product, but if you want, we can build it out for you (of course in much less than a year). And so, NetEye Rules again.

Rocco Pezzani

Author

Rocco Pezzani

Latest posts by Rocco Pezzani

18. 03. 2025 Icinga Web 2, ITOA, NetEye, UI, Unified Monitoring

A First Step towards Multitenancy in Icinga 2

31. 12. 2024 Business Service Monitoring, ITOA, NetEye, SLM, Unified Monitoring

Display a Service’s Availability with ITOA

30. 11. 2024 Business Service Monitoring, NetEye, Unified Monitoring

The Story of a Strange Business Process

30. 10. 2024 Log-SIEM, NetEye

Elasticsearch Restart and Network Tuning

09. 09. 2024 Log-SIEM, NetEye

Prevent Elasticsearch Crashes Using Disk Watermarks

See All

23. 03. 2026 Cloud, Log-SIEM

How to Collect Cloudwatch AWS Logs in NetEye

Recently we had to monitor an EKS cluster and several other resources using NetEye. AWS already provides solid dashboards out of the box, but log analysis isn't as flexible as in Elasticsearch, and costs can easily grow out of control. Read More

16. 03. 2026 APM, Knowledge Management, Log-SIEM, Training

Inside Elastic Certifications: My Experience Between Preparation and Exams

In this article I'd like to share my experience with Elastic certifications. Recently, I had the opportunity to take the Elastic Certified Engineer and Elastic Certified Observability Engineer exams and I'd like to describe my preparation, experience and finally share Read More

05. 03. 2026 Bug Fixes, NetEye

NetEye 4 – Security Advisory (GLPI)

Important: GLPI security update Type/Severity NetEye Product Security has rated this update as having a high security impact. Topic An update for the glpi packages is now available for NetEye 4. Security Fix for NetEye 4.46 10.0.24_neteye1.17.5-1 Summary The vulnerability is about a Read More

04. 03. 2026 Bug Fixes, NetEye

NetEye 4 – Security Advisory (Lampo)

Important: Lampo security update Type/Severity NetEye Product Security has rated this update as having a medium security impact. Topic An update for the lampod packages is now available for NetEye 4. Security Fix for NetEye 4.46 1.1.3-1 Summary The vulnerability is about sensitive Read More

04. 03. 2026 Bug Fixes, NetEye

NetEye 4 – Security Advisory (Elastic Stack)

Important: Elastic Stack security update Type/Severity NetEye Product Security has rated this update as having a Medium security impact. Topic An update for the Kibana package is now available for NetEye 4. Security Fix for NetEye 4.46 9.2.6_neteye3.90.8-1 CVEs CVE-2026-26934CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H CVE-2026-26935CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H CVE-2026-26936CVSS:3.1/AV:N/AC:L/PR:H/UI:N/S:U/C:N/I:N/A:H CVE-2026-26937CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H Read More

Massive Near-Real-Time Monitoring with NetEye