When working with Logstash in production, one of the often-overlooked areas is the Dead Letter Queue (DLQ). This queue stores events that Logstash cannot process, usually due to parsing errors, mapping conflicts, or pipeline misconfigurations.
While the DLQ is useful for troubleshooting, leaving it unmonitored can be dangerous: if it grows unnoticed, critical data might never reach Elasticsearch.
To address this gap, starting from NetEye 4.42 we’ve released a native Python check for Icinga that monitors the Logstash DLQ and provides actionable alerts.
Why Monitoring the DLQ Matters
Data quality: DLQ growth is often a symptom of malformed events or broken pipelines.
Reliability: Large DLQs mean that your ingestion pipeline is silently discarding valuable data.
Proactive alerting: Instead of discovering issues after missing dashboards or alerts, Icinga notifies you as soon as the DLQ crosses defined thresholds.
In short, monitoring the DLQ helps turn hidden ingestion problems into visible and actionable alerts.
The Python Check
The plugin is written in Python and uses the Logstash Monitoring API (_node/stats/pipelines) to query DLQ statistics for all pipelines. It evaluates usage against configurable thresholds and returns Icinga-compatible output.
Features
Thresholds for warning and critical levels (in percent).
Per-pipeline or multi-pipeline checks.
Performance data for graphing in Icinga.
Visual usage bar in the output.
Graceful handling of missing or unreachable DLQs.
This check has already been deployed under the NetEye Local Self Monitoring host:
Usage Examples
Check all pipelines with thresholds at 70% (warning) and 90% (critical):
CRITICAL - Pipeline 'main' DLQ at 92.31% [███████████████████-]
OK - Pipeline 'packetbeat' DLQ at 0.00% [--------------------]
OK - Pipeline 'metricbeat' DLQ at 0.00% [--------------------]
Or check a specific pipeline only:
/neteye/shared/monitoring/plugins/check_logstash_dlq.py --pipeline main --warning 50 --critical 80
Real-World Benefits & Conclusion
Since deploying this check, we have been able to see immediately highlight:
I'm an IT professional with a strong knowledge of Security Information and Event Management solutions.
I have proven experience in multiple Enterprise contexts with managing, designing, and administering Security Information and Event Management (SIEM) solutions (including log source management, parsing, alerting and data visualizations), its related processes and on-premises and cloud architectures, as well as implementing Use Cases and Correlation Rules to enable SOC teams to detect and respond to cyber threats.
Author
Matteo Cipolletta
I'm an IT professional with a strong knowledge of Security Information and Event Management solutions.
I have proven experience in multiple Enterprise contexts with managing, designing, and administering Security Information and Event Management (SIEM) solutions (including log source management, parsing, alerting and data visualizations), its related processes and on-premises and cloud architectures, as well as implementing Use Cases and Correlation Rules to enable SOC teams to detect and respond to cyber threats.