01. 10. 2021 Andrea Avancini Business Service Monitoring, NetEye

Hosts and NetEye Upgrade

NetEye installation topology can fluctuate over time, with hosts of various types that can be, for example, added or removed from a cluster in response to changes of business demands or customer requirements.

In a cluster environment, hosts are manually mapped in a file called /etc/neteye-cluster, some sort of static inventory solution that is the unique source of trust on the NetEye cluster topology, and it is synchronized across all the hosts belonging to the cluster itself.

An example of /etc/neteye-cluster file is the following:

{
   "Hostname" : "neteye-cluster.myneteye.cluster",
   "Nodes" : [
      {
         "addr" : "192.168.47.1",
         "hostname" : "neteye01.neteyelocal",
         "hostname_ext" : "neteye-cluster01.myneteye.cluster",
         "id" : 1
      },
      {
         "addr" : "192.168.47.2",
         "hostname" : "neteye02.neteyelocal",
         "hostname_ext" : "neteye-cluster02.myneteye.cluster",
         "id" : 2
      }
   ],
   "ElasticOnlyNodes": [
      {
         "addr" : "192.168.47.3",
         "hostname" : "neteye03.neteyelocal",
         "hostname_ext" : "neteye-cluster03.myneteye.cluster"
      }
   ],
   "VotingOnlyNode" : {
      "addr" : "192.168.47.4",
      "hostname" : "neteye04.neteyelocal",
      "hostname_ext" : "neteye-cluster04.myneteye.cluster",
      "id" : 3
   }
}

NetEye Nodes

As you can see, the file describes the NetEye cluster and the attributes of its hosts, dividing them by type. Available types are:

Nodes, which are standard Neteye nodes that form the backbone of the Red Hat cluster architecture of NetEye 4;

ElasticOnlyNodes, which are nodes that run Elasticsearch only and form the Elastic cluster;

VotingOnlyNode, which is a node that has the only purpose of providing a quorum for DRBD, PCS, and Elasticsearch.

As you can imagine, by default the neteye upgrade or neteye update commands do not know what hosts are part of the whole NetEye environment, so how can we identify which hosts to upgrade, distinguishing them according to their type?

We can resort to the /etc/neteye-cluster file to map the hosts and their type to host groups, which can be used by our commands to manage the update/upgrade procedures.

Hosts and Patterns

Ansible, in fact, works against multiple nodes at the same time, using lists of nodes or groups of lists known as inventory. Then, we can use the so-called patterns to select hosts or host groups to run our commands against.

We use a dynamic inventory script to map hosts from the /etc/neteye-cluster file to host groups, which is invoked at run time when calling the Ansible playbooks that perform the update/upgrade operations. The script can also be executed via command line in any NetEye installation:

[root@neteye01 ~]# python /usr/share/neteye/scripts/upgrade/dynamic-inventory.py --list | jq

The output of the command, considering our example, is the following:

{
  "_meta": {
    "hostvars": {
      "neteye03.neteyelocal": {
        "internal_node_addr": "192.168.47.3"
      },
      "neteye02.neteyelocal": {
        "internal_node_addr": "192.168.47.2"
      },
      "neteye01.neteyelocal": {
        "internal_node_addr": "192.168.47.1"
      },
      "neteye04.neteyelocal": {
        "internal_node_addr": "192.168.47.4"
      }
    }
  },
  "all": {
    "vars": {
      "cluster": "true",
      "ansible_ssh_user": "root"
    }
  },
  "voting_nodes": {
    "hosts": [
      "neteye04.neteyelocal"
    ]
  },
  "nodes": {
    "hosts": [
      "neteye04.neteyelocal",
      "neteye01.neteyelocal",
      "neteye02.neteyelocal"
    ]
  },
  "es_nodes": {
    "hosts": [
      "neteye03.neteyelocal"
    ]
  }
}

We can use the patterns in our dynamic inventory to target specific host groups to run ad-hoc commands on them.

For example, let’s imagine that we want to know which Linux distribution is currently equipping the ElasticOnly nodes of our NetEye cluster. We can write a very simple Ansible playbook, called es_only_nodes.yml, like the one that follows:

- hosts: es_nodes
  any_errors_fatal: true
  gather_facts: true

  tasks:
  - name: print the Linux distro on es nodes only
    debug:
        msg: "{{ ansible_distribution }}-{{ ansible_distribution_version }}"

You can notice from line 2 that we target a specific pattern, es_nodes, to execute the tasks in our playbook ElasticOnly nodes only.

In fact, if we execute the playbook, what we obtain is

[root@neteye01 ~]# ansible-playbook -i /usr/share/neteye/scripts/upgrade/dynamic-inventory.py es_node_only.yml

PLAY [es_nodes] ***********************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************************************************************************************************************
ok: [neteye03.neteyelocal]

TASK [print the Linux distro on es nodes only] ****************************************************************************************************************************************************************************************
ok: [neteye03.neteyelocal] => {
    "msg": "CentOS-7.9"
}

PLAY RECAP ****************************************************************************************************************************************************************************************************************************
neteye03.neteyelocal       : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

As you can see, the playbook has been run against neteye03.neteyelocal, our only ElasticOnly node, avoiding the other nodes.

The neteye upgrade, in particular, uses the very same mechanism to operate on the NetEye cluster during the upgrade procedure. We use patterns like nodes,!voting_nodes,!es_nodes to identify which hosts we can run PCS commands.

In our example, the pattern nodes identifies hosts

[neteye04.neteyelocal,neteye01.neteyelocal,neteye02.neteyelocal]

but we also state that we want to explicitly exclude patterns voting_nodes and es_nodes (pay attention to the ! in front of those two patterns) from the playbook execution.

This means that the resulting list of hosts is composed of all the hosts in nodes except those in voting_nodes and es_nodes. Since neteye04.neteyelocal is also a member of voting_nodes, it is excluded from the execution of the playbook, which is limited to [neteye01.neteyelocal,neteye02.neteyelocal].

Trick – Avoid the standby of the entire cluster during upgrade

Imagine having a NetEye cluster as in the following /etc/neteye-cluster file:

{
   "Hostname" : "neteye-cluster.myneteye.cluster",
   "Nodes" : [
      {
         "addr" : "192.168.47.1",
         "hostname" : "neteye01.neteyelocal",
         "hostname_ext" : "neteye-cluster01.myneteye.cluster",
         "id" : 1
      },
      {
         "addr" : "192.168.47.2",
         "hostname" : "neteye02.neteyelocal",
         "hostname_ext" : "neteye-cluster02.myneteye.cluster",
         "id" : 2
      },
     {
         "addr" : "192.168.47.3",
         "hostname" : "neteye03.neteyelocal",
         "hostname_ext" : "neteye-cluster03.myneteye.cluster",
         "id": 3
      }
   ]
}

For some reason, node neteye01.neteyelocal must be kept on standby to avoid allocating cluster resources in it, with the other two nodes sharing all the workload.

During the upgrade to NetEye 4.20, however, all the nodes are put on standby by the neteye upgrade command, thus disrupting the regular NetEye operations.

Why?

This can happen because of the peculiarity of this cluster, which comes with the first node of the host group nodes on standby. The neteye upgrade command, in fact, elects that very first node of that host group as always active node during the upgrade procedures and tries to put the other nodes on standby.

To solve the issue, you can reorder your nodes in /etc/neteye-cluster as follows:

{
   "Hostname" : "neteye-cluster.myneteye.cluster",
   "Nodes" : [
      {
         "addr" : "192.168.47.2",
         "hostname" : "neteye02.neteyelocal",
         "hostname_ext" : "neteye-cluster02.myneteye.cluster",
         "id" : 2
      },
     {
         "addr" : "192.168.47.3",
         "hostname" : "neteye03.neteyelocal",
         "hostname_ext" : "neteye-cluster03.myneteye.cluster",
         "id": 3
      },
      {
         "addr" : "192.168.47.1",
         "hostname" : "neteye01.neteyelocal",
         "hostname_ext" : "neteye-cluster01.myneteye.cluster",
         "id" : 1
      }
   ]
}

This way, the neteye upgrade command will elect node neteye02.neteyelocal as always active, preserving the functionalities of the NetEye cluster during the upgrade.

Andrea Avancini

Andrea Avancini

DevOps Engineer at Würth Phoenix
Loving understanding of how things work, how things can be automated, and how to apply new technologies when needed. Passionate about technology, open-source software, and security. I found Würth Phoenix the right place for this. In the past, I co-founded a cybersecurity startup that produces security solutions for mobile apps and blockchain. Previously, I worked as researcher at Fondazione Bruno Kessler of Trento. My research was mainly focused on web and mobile app security and testing. I got my PhD in Computer Science at the University of Trento.

Author

Andrea Avancini

Loving understanding of how things work, how things can be automated, and how to apply new technologies when needed. Passionate about technology, open-source software, and security. I found Würth Phoenix the right place for this. In the past, I co-founded a cybersecurity startup that produces security solutions for mobile apps and blockchain. Previously, I worked as researcher at Fondazione Bruno Kessler of Trento. My research was mainly focused on web and mobile app security and testing. I got my PhD in Computer Science at the University of Trento.

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive