24. 12. 2023 Damiano Chini Development, DevOps, Log-SIEM, NetEye

Making ELK Updates Smoother with Configurators and Ansible

Recently (in September 2023) NetEye integrated version 8.8 of the Elastic Stack, which is just one of many Elastic updates brought into NetEye 4.

Since this Elastic update there was a major upgrade (from version 7.17) coming with many breaking changes, so we, as the NetEye R&D team, wanted to make this important upgrade as safe and smooth as possible for our users.

In particular, our main goal was to minimize the downtime across all components of the Elastic Stack, which should translate to no (or minimal) disservice for NetEye users, and less stress when fixing problems for the NetEye administrators that are performing the upgrade.

Why the old NetEye update/upgrade procedure was too restrictive for us

To achieve minimal downtime in the components of the Elastic Stack, a precise procedure must be followed as explained in the official Elastic documentation. Our problem was that this procedure could not be supported with the way NetEye 4 was handling updates until now, which can be summed up in 2 steps:

  1. Updating, on all nodes, all RPMs present in the NetEye repositories
  2. Executing the NetEye Secure Install script on all nodes

So why is this procedure not fine for the update of the Elastic Stack? Well, for 2 main reasons:

  1. Updating RPMs of (for example) Elasticsearch on all nodes at once may be risky due to possible incompatibilities with the new version: if some configuration is not compatible with the new version, you’d be in a situation where all the nodes are in an incorrect state, which may lead to broken services for users.
  2. Updating the RPMs of all Elastic Stack components at once means that for a long part of the update/upgrade, some components may be down. For example, Kibana UI is inaccessible from the moment you update the RPM to the moment when you configure and restart the service, which was happening only towards the end of the execution of the NetEye Secure Install.

How did we modify the NetEye update/upgrade procedure?

To overcome the problems we just discussed we decided to introduce the concept of “Configurators” of the components of NetEye. The configurator of a NetEye component is in charge of updating the packages of that component and configuring it, so that the update of each component is handled separately and each component can define the exact procedure of how it should be updated.

So we introduced an “Elastic Stack Configurator” (and later an “Icinga 2 Configurator”), which is an Ansible procedure that has complete control over how the Elastic Stack components must be updated and configured on the NetEye Nodes. This configurator is integrated in the NetEye Update and Upgrade commands by running them as first thing right after updating the NetEye repositories definition, as you can see below:

Writing the configurator in Ansible allows us to have a very simple way of defining how the configuration of the components must be performed on the various nodes, thanks to the declarative nature of the language and the very simple way it allows us to manage the multiple nodes of our NetEye Clusters.

Next steps

At the moment this article was written, the configurator procedures are still being run sequentially, one after the other, which means that the usage of configurator does not improve the duration of updates/upgrades.

Nonetheless this was a first step in the path to making the whole update/upgrade procedure faster, because now we have isolated update procedures for each NetEye component, where previously they were all entangled together. This means that in order to make NetEye updates/upgrades faster, we are now able to run the configurators in parallel when there are no dependencies between components. But this an improvement that will be evaluated in the future, so stay tuned!

These Solutions are Engineered by Humans

Did you find this article interesting? Does it match your skill set? Programming is at the heart of how we develop customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.

Damiano Chini

Damiano Chini

Author

Damiano Chini

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive