05. 10. 2018 Michele Santuari Log Management, Log-SIEM, NetEye

How an Elasticsearch Cluster Fits in with a NetEye Cluster

This blog post describes the basic architecture of an Elasticsearch cluster.  The deployment of a cluster is needed to provide high-availability and, whenever possible, to increase performance.

NetEye 4’s clustering service is based on RedHat 7’s High Availability Clustering technologies:

  • Corosync:  Provides group communication between a set of nodes, application restart upon failure, and a quorum system.
  • Pacemaker:  Provides cluster management, lock management, and fencing.
  • DRBD:  Provides data redundancy by mirroring devices (hard drives, partitions, logical volumes, etc.) between hosts in real time.

On the contrary, NetEye 4 leverages the native clustering capabilities of Elasticsearch rather than Red Hat’s HA Clustering.  Each NetEye 4 cluster node runs a local master-eligible Elasticsearch service connected to all other nodes.  Based on the number of nodes, NetEye 4 deploys and automatically configures the Elasticsearch cluster architecture.

To understand the architecture of an Elasticsearch cluster, we need to define two concepts:

  • Index:  A collection of documents which have somehow similar characteristics.
  • Shard:  The documents in an index can be subdivided and stored in different nodes.  An index may be accessed transparently as the shard would be stored in a single node. Shards can be primary or replica, where the latter is a copy of the primary shard on at least one other node.

The simplest Elasticsearch cluster architecture is the two-node one, which means that both the master and replica nodes have the same shards allocated, but only one node has all shards active.  So if you lose a node, all shards will still be available in the other node.  As shown here, Node 1 is the master (primary shards 1 and 3) and Node 2 has a copy of the same shards (replica shards 1 and 3), so that it’s ready to become the primary node if necessary.

The architecture above does not support advanced features such as load-balancing when accessing shards; for that at least a three-node cluster is required.  In that architecture, the hardware resources of each node can be shared among fewer shards, providing better performance per shard: documents and searches can be retrieved/performed from both the primary and the replica.  By default NetEye4 Elasticsearch copies each original shard to a replica in another node.

You can see in the diagram above that if a node fails, the cluster will be not impacted because the other nodes have a complete copy of the lost primary shards (e.g., if Node 1 fails, Node 2 has a replica for shard 1, and Node 3 has one for shard 3).

Given this configuration, if a failure occurs then the minimum number of master nodes which must be active is defined by the quorum: floor(N/2)+1, where N is the master-eligible number of nodes in the cluster (in a NetEye4 Elasticsearch cluster all nodes by default can be elected as master).  NetEye 4’s clustering provides an automated configuration of the quorum setting to prevent split brains (i.e., two master nodes in a single cluster may impact data integrity and, in general, the stability of the cluster).  You can change this setting via the REST API:

# curl -s -X PUT "127.0.0.1:9200/_cluster/settings" \
-H 'Content-Type: application/json' \
-d"{ \"persistent\" : { \"discovery.zen.minimum_master_nodes\" : <YOUR_QUORUM> } }"


Michele Santuari

Michele Santuari

Software Architect at Wuerth Phoenix
Hi, my name is Michele Santuari and I am a Telecommunication engineer felt in love with OpenFlow, the first attempt of centralized network management, provisioning, and monitoring. I embraced the Software Defined Networking approach to discover a passion for programming languages. Now, I am into Agile methodologies and crazy development process management.

Author

Michele Santuari

Hi, my name is Michele Santuari and I am a Telecommunication engineer felt in love with OpenFlow, the first attempt of centralized network management, provisioning, and monitoring. I embraced the Software Defined Networking approach to discover a passion for programming languages. Now, I am into Agile methodologies and crazy development process management.

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive