01. 07. 2021 Damiano Chini Log Management, Log-SIEM, NetEye

El Proxy – Error Handling

Beginning with NetEye 4.17, the NetEye Log Management module has been able to rely on the new Real Time Log Signing architecture, which aims to overcome some weaknesses in the previous Log Management implementation based on rsyslog.

One of the core components of the new architecture is the new El Proxy daemon, whose tasks are to:

  1. Receive the newly generated logs from Logstash
  2. Construct log blockchains from these logs
  3. Index the log blockchain into Elasticsearch

Problem

In this context, one of the biggest critical questions within El Proxy is

What do we do if we cannot index some documents in Elasticsearch?
  1. Should we discard these documents, thus breaking the blockchain?
  2. Or should we keep retrying to index the documents, risking that backpressure will occur in the Real Time Log Signing architecture?

Our Solution

After reasoning about the pros and cons, we found our answer in a reasoned compromise between these two approaches, where one approach overrides the other depending on the (assumed) root cause of the Elasticsearch indexing error.

The main idea is that, if we think that the indexing error is recoverable, then we keep trying to index the document. This happens for example when Elasticsearch is down (we can assume that sooner or later it will start again).

If instead the indexing error is non-recoverable, we first try to transform the input data to recover from the error (see below), and if this also fails, we finally give up indexing the document (the log) in the blockchain and we put it into a Dead Letter Queue.

These choices should avoid unnecessary retrials to index logs which cannot be indexed, but still allow keeping the blockchain valid unless it really is impossible to keep it intact.

Recovering from the non-recoverable Elasticsearch indexing error

With “non-recoverable Elasticsearch indexing error” we mean an error that occurs during the indexing of a document which is caused by the document itself, for example when the document is not compliant with the Elasticsearch index mapping.

What El Proxy tries to do in these cases is to modify the document itself, so that the document only contains the minimal set of data required to provide meaningful information to the user, and to enable the verification of the log blockchain. The minimal information of the documents is configured by the Log Manager admin, by specifying the documents’ minimal fields to be kept in these cases.

Just keeping the minimal fields of a document can prevent Elasticsearch indexing errors because having fewer fields in documents implies a lower chance that the document field types won’t match the Elasticsearch index mapping.

If you’d like to find out more about El Proxy Error Handling, you can have a look at the online NetEye User Guide

Damiano Chini

Damiano Chini

Author

Damiano Chini

Leave a Reply

Your email address will not be published.

Archive