12. 12. 2022 Mattia Codato NetEye, Unified Monitoring

ClickHouse – High Availability Cluster

As you have surely read from the release notes of NetEye 4.27, we have integrated ClickHouse to be able to use the historical flows and alerts feature of ntopng.

What is ClickHouse?

Directly from the official website:

ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP). ClickHouse’s performance exceeds all other column-oriented database management systems. It processes billions of rows and tens of gigabytes of data per server per second.

How is ClickHouse Integrated into NetEye?

ClickHouse runs in NetEye as a local service connected with other nodes in the NetEye cluster to ensure a high availability of service.

With the NetEye 4.27 upgrade, ClickHouse is preconfigured with one shard, and all data is replicated on each standard NetEye node. The voting-only node in this case does not hold data but only votes when needed to maintain the quorum.

In /neteye/local/clickhouse-server/ you can find the configuration, logs, and data related to clickhouse-server. Should it become necessary for a particular use case to make changes to the configuration, we recommend creating a new file in /neteye/local/clickhouse-server/conf/config.d. Files in this path are loaded in lexicographic order.

Connecting to ClickHouse

The easiest way to connect from the CLI to ClickHouse is to use the clickhouse-client command, which is already preconfigured to use a TLS connection:

[root@node01 ~]# clickhouse-client 
ClickHouse client version 22.9.2.7 (official build).
Connecting to clickhouse-server.neteyelocal:9440 as user default.
Connected to ClickHouse server version 22.9.2 revision 54460.

node02 :)

We’ve now successfully connected to clickhouse-server.neteyelocal, where an instance of Nginx is listening in load balancer mode, which then forwards us to clickhouse-server on node02.

Inserting Data in ClickHouse

Consider the example of a customization that needs to insert data into ClickHouse. The first thing to do is create a dedicated database.

As I mentioned above, NetEye provides a ClickHouse HA cluster:

node02 :) show clusters

SHOW CLUSTERS

Query id: 46cf26e5-61ac-4d96-854e-91950263df43

┌─cluster─────────────────────────────────────────┐
│ neteye_ha_cluster                               │
└─────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

so you will need to create the database within this cluster

node01 :) CREATE DATABASE mycustomization ON CLUSTER neteye_ha_cluster;

CREATE DATABASE mycustomization ON CLUSTER neteye_ha_cluster

Query id: f2a6d412-62dc-48b3-94ef-35c8962a77ed

┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ neteye02.neteyelocal │ 9440 │      0 │       │                   1 │                0 │
│ neteye01.neteyelocal │ 9440 │      0 │       │                   0 │                0 │
└──────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

2 rows in set. Elapsed: 0.114 sec.

as you can see from the output of the command, the query was executed on both standard NetEye nodes.

Now it’s time to create a new table:

node01 :) use mycustomization;

USE mycustomization

Query id: 65acaa86-07d6-4878-880f-da7494f40aae

Ok.

0 rows in set. Elapsed: 0.002 sec. 

CREATE TABLE mylogs ON CLUSTER neteye_ha_cluster
(
    `date` DateTime,
    `user` String,
    `message` String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/mycustomization/mylogs', '{replica}')
PARTITION BY toYYYYMMDD(date)
ORDER BY date

Query id: 25d929d8-139d-4943-9932-03377b1b3484

┌─host─────────────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ neteye02.neteyelocal │ 9440 │      0 │       │                   1 │                0 │
│ neteye01.neteyelocal │ 9440 │      0 │       │                   0 │                0 │
└──────────────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

2 rows in set. Elapsed: 0.115 sec.

When creating a table you have to pay attention to two things: create it within the cluster via the syntax ON CLUSTER neteye_ha_cluster and the type of engine. To provide data redundancy on each node, the engine type must be ReplicatedMergeTree and must have the following syntax:
ReplicatedMergeTree('/clickhouse/tables/{shard}/<DB_NAME>/<TABLE_NAME>', '{replica}')

The table is ready and you can proceed to insert some initial data:

INSERT INTO mylogs (date, user, message) VALUES ('2022-11-13 00:00:01', 'root', 'User logged in');

Query id: c1507766-4edc-4163-97ff-60ee8a012233

Ok.

1 row in set. Elapsed: 0.016 sec. 

And retrieve the inserted data:

SELECT *
FROM mylogs

Query id: 0c67b927-0a89-4027-ae8e-e40c7cc7245b

┌────────────────date─┬─user─┬─message────────┐
│ 2022-11-13 00:00:01 │ root │ User logged in │
└─────────────────────┴──────┴────────────────┘

1 rows in set. Elapsed: 0.002 sec. 

Thanks to the ReplicatedMergeTree engine type, when fetching via a select you don’t need to worry about which node they were inserted into since they are present on both cluster nodes.

Conclusion

In this short article, we looked at the easiest way to take advantage of the high availability ClickHouse configuration present in NetEye. For advanced configurations and special use cases I recommend the official documentation as a starting point.

These Solutions are Engineered by Humans

Did you find this article interesting? Does it match your skill set? Programming is at the heart of how we develop customized solutions. In fact, we’re currently hiring for roles just like this and others here at Würth Phoenix.

Mattia Codato

Mattia Codato

Software Developer - IT System & Service Management Solutions at Würth Phoenix

Author

Mattia Codato

Software Developer - IT System & Service Management Solutions at Würth Phoenix

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive