Native Monitoring of the Logstash Dead Letter Queue
When working with Logstash in production, one of the often-overlooked areas is the Dead Letter Queue (DLQ). This queue stores events that Logstash cannot process, usually due to parsing errors, mapping conflicts, or pipeline misconfigurations.
While the DLQ is useful for troubleshooting, leaving it unmonitored can be dangerous: if it grows unnoticed, critical data might never reach Elasticsearch.
To address this issue, starting from NetEye 4.42 we’ve released a native Python check for Icinga that monitors the Logstash DLQ and provides actionable alerts.
Why Monitoring the DLQ Matters
Data quality: DLQ growth is often a symptom of malformed events or broken pipelines
Reliability: Large DLQs mean that your ingestion pipeline is silently discarding valuable data
Proactive alerting: Instead of discovering issues after missing dashboards or alerts, Icinga notifies you as soon as the DLQ crosses defined thresholds
In short, monitoring the DLQ helps turn hidden ingestion problems into visible and actionable alerts.
The Python Check
The plugin is written in Python and uses the Logstash Monitoring API (_node/stats/pipelines) to query DLQ statistics for all pipelines. It evaluates usage against configurable thresholds and returns Icinga-compatible output.
Features
Thresholds for warning and critical levels (in percent)
Per-pipeline or multi-pipeline checks
Performance data for graphing in Icinga
Visual usage bar in the output
Graceful handling of missing or unreachable DLQs
This check has already been deployed under the NetEye Local Self Monitoring host:
Usage Examples
Check all pipelines with thresholds at 70% (warning) and 90% (critical):
CRITICAL - Pipeline 'main' DLQ at 92.31% [███████████████████-]
OK - Pipeline 'packetbeat' DLQ at 0.00% [--------------------]
OK - Pipeline 'metricbeat' DLQ at 0.00% [--------------------]
Or check a specific pipeline only:
/neteye/shared/monitoring/plugins/check_logstash_dlq.py --pipeline main --warning 50 --critical 80
Real-World Benefits & Conclusion
Since deploying this check, we’ve been able to immediately see:
I'm an IT professional with a strong knowledge of Security Information and Event Management solutions.
I have proven experience in multiple Enterprise contexts with managing, designing, and administering Security Information and Event Management (SIEM) solutions (including log source management, parsing, alerting and data visualizations), its related processes and on-premises and cloud architectures, as well as implementing Use Cases and Correlation Rules to enable SOC teams to detect and respond to cyber threats.
Author
Matteo Cipolletta
I'm an IT professional with a strong knowledge of Security Information and Event Management solutions.
I have proven experience in multiple Enterprise contexts with managing, designing, and administering Security Information and Event Management (SIEM) solutions (including log source management, parsing, alerting and data visualizations), its related processes and on-premises and cloud architectures, as well as implementing Use Cases and Correlation Rules to enable SOC teams to detect and respond to cyber threats.
In high-demand environments, efficiency isn't just an advantage – it's essential. One of the biggest hurdles we encountered was the overwhelming strain placed on NetEye's (Elastic) master nodes during the data enrichment process. As data volumes skyrocket, so do the Read More
In the enormous world of Log Collection, quite often customers need to collect logs from various systems in remote locations, like from an office in another country. For Icinga we know that the latest NetEye 4.20 release fully supports distributed Read More
So you have a Microsoft Exchange mail server infrastructure and want full control over it using the NetEye 4 Log Management module? Yes, you can do that. An Exchange server writes out various log files: MessageTrackingImap4/Pop3SmtpIIS logs To be able Read More
A bug has been discovered on NetEye modules logmanagement and SIEM. If affected, rsyslog directories on system might be created with wrong permissions causing Logstash to be unable to load log lines of some hosts inside Elasticsearch. Users might also Read More
Some time ago I published an article about how to store the NetEye SMS Protocol log into an ELK environment. Now, after using it some times, I discovered that it was not completely correct as the time/date functions for the Read More