If a database goes down unexpectedly, it causes a domino effect on the systems that depend on it: the web shop won’t load, CRM freezes, and business processes get disrupted. NetEye is no different – if MariaDB fails, we lose access to WebUI, logins, Grafana dashboards, Assets management, and all other related functions.
That’s why we replaced the single server running under the PCS-DRBD pair with a Galera cluster. This setup helps avoid a full database shutdown and restart, preventing even short-term outages of dependent services in many cases.
For the Galera cluster to perform well, we need to watch several factors, including CPU, memory, and disk I/O. It’s unwise to make backups during a busy day since physical backups can overload nodes, causing synchronization loss or temporary cluster freezing.
Quieter periods like nighttime or early morning are better for backups – if those times exist. But what if we want more frequent, fresher backups, say every four hours, without heavily taxing the active nodes?
In this post, I’ll show you how to set up a dedicated, read-only replica. You can back up this replica at any time, as often as you want, without interrupting the operational nodes.
Proper configuration is crucial for reliable replication and high availability in a Galera cluster. Let’sbegin with the setup.
First, create a dedicated user to initiate and manage the replication process:
CREATE USER 'replicant'@'%' identified by '<PASSWORD>';
GRANT REPLICATION SLAVE ADMIN on *.* to 'replicant'@'%';
FLUSH PRIVILEGES;
Apply the following configuration changes to all operational nodes in the cluster. No changes are required on a dedicated arbitrator node, if you’re using that.
Create a new file, e.g. vim /neteye/local/mariadb/conf/my.cnf.d/replica.cnf and copy the following:
[mysqld]
log_slave_updates = ON
# wsrep_gtid_domain_id and server-id need to have the
# same value on all nodes in the cluster.
server-id = 01
wsrep_gtid_domain_id = 0
wsrep_gtid_mode = ON
log-bin = /neteye/local/mariadb/log/master-bin
log-bin-index = /neteye/local/mariadb/log/master-bin.index
# Should be set to a different value on all nodes in a given cluster,
# and each of these values should be different than the configured wsrep_gtid_domain_id value.
# E.g.: node1= 10 and node2=11
gtid_domain_id = 10
expire_logs_days = 1
The replica-related settings won’t be activated by restarting the nodes (rolling restart), you have to restart the entire cluster. In practice, shut down the nodes one by one, then bootstrap from the node with the most advanced state.
Here’s a brief explanation of the settings:
[mysqld]: This section indicates that the settings below apply to the MariaDB server (mysqld) daemon.log_slave_updates = ON: This allows a replication “slave” (secondary) server to also record in its own binary log the changes it has received and applied from the “master” (primary) server.server-id = 01: Gives a unique identifier to the MariaDB cluster in the replication topology. Each server/cluster must have a unique server-id, but within the cluster, the operational nodes must have the same identifier, so all 01.wsrep_gtid_domain_id = 0: This is the GTID (Global Transaction Identifier) domain identifier associated with Galera Cluster replication. This is used independently of gtid_domain_id in the Galera environment. As with server-id, the operational nodes must have the same ID, so all 0.wsrep_gtid_mode = ON: Turns on Galera Cluster GTID mode, which allows transactions to be uniquely identified and tracked across the cluster.log-bin: Specifies the base name and path of the binary log files. MariaDB records all database modification operations in this file, which is essential for replication and point-in-time recovery.log-bin-index: Specifies the path and name of the binary log index file. This stores the list and order of binary log files.gtid_domain_id = 10: This is the GTID domain ID associated with traditional MariaDB replication. Each replication domain requires a unique ID to avoid conflicts. Here, nodes in the cluster must also be unique. For example, one operational node is 10, the other is 11, and the replica is 12.expire_logs_days = 1: Specifies how many days MariaDB should keep binary log files before automatically deleting them. This helps with disk space management.A quick note: GTID is a MariaDB-specific addition. It’s not essential, but its use greatly simplifies configuration and operation.
For the replica we will use a dedicated, separate virtual machine (VM). This ensures complete isolation from the cluster, allowing us to make backups at any time without disturbing the production environment.
In order for requests from the replica running outside the cluster to reach the “internal” MariaDB address, we also have to configure the firewall appropriately. We solved this problem with the following command:
firewall-cmd --permanent --zone=public --add-rich-rule="rule family=\"ipv4\" source address=\"<SLAVE-IP>\" forward-port port=\"30306\" protocol=\"tcp\" to-port=\"3306\" destination address=\"<CLUSTER-IP>\""
firewall-cmd --permanent --add-port=30306/tcp && firewall-cmd --reload
So we enabled an “external” port (30306) that only the replica can connect to. The keepalive configured on our operational nodes ensures that the slave node can always reach a Galera node via the Cluster IP to start and maintain the replication process.
For simplicity (and because I like it), I use Podman for the whole process on a Red Hat VM, so I’ll show you how to configure it.
Installing the required packages:
dnf install zstd podman mariadb-backup mbstream
Configure the MariaDB replica instance for the Podman container:
mkdir -p /srv/podman/mariadb/{conf,data,log}
Note: Since this is where the entire database and logs will be stored, make sure you have enough storage space. Obviously, the slave node must have enough storage space to store and save all the information from our database.
Create a file /etc/containers/systemd/podman-mariadb-slave.container with the following content:
Unit]
Description=MariaDB Slave Service
After=network-online.target
Wants=network-online.target
[Container]
ContainerName=podman-mariadb-slave
Image=localhost/mariadb:10.11.10
PublishPort=3308:3306
# Volume
Volume=/srv/podman/mariadb/conf/replica_slave.cnf:/etc/mysql/conf.d/replica.cnf
Volume=/srv/podman/mariadb/data:/srv/podman/mariadb/data
Volume=/srv/podman/mariadb/log:/srv/podman/mariadb/log
Exec=--datadir=/srv/podman/mariadb/data
[Service]
Restart=on-failure
TimeoutStopSec=70
[Install]
WantedBy=default.target
Check it with the command below, but don’t run the container yet as there are still some configuration steps to do first.
/usr/libexec/podman/quadlet -dryrun
If you don’t have a relatively recent backup, take a physical backup of your primary MariaDB server. It’s important to include the “replicant” user.
I use the daily nightly backup, which is a compressed backup, so I uncompress it first:
zstd -d /tmp/galera-backup.zstd -c | mbstream -x -C /srv/podman/mariadb/data/
Then “prepare” the data:
mariabackup --prepare --target-dir=/srv/podman/mariadb/data
Set the permissions to 999 – this is the mysql user in the container:
chown -R 999:999 /srv/podman/mariadb/data
After preparation you can get the binlog status, which you need to start the replication. For example:
cat /srv/podman/mariadb/data/xtrabackup_binlog_pos_innodb /neteye/local/mariadb/log/master-bin.000160 858584413
Now you can configure the replica (vim /srv/podman/mariadb/conf/replica.cnf):
[mysqld]
server-id = <Must be unique number different from other nodes e.g.: 99>
relay-log-index = /srv/podman/mariadb/log/slave-relay-bin.index
relay-log = /srv/podman/mariadb/log/slave-relay-bin
log-bin = /srv/podman/mariadb/log/slave-bin
log-bin-index = /srv/podman/mariadb/log/slave-bin.index
binlog_format = mixed
log_slave_updates = ON
expire_logs_days = 1
read-only = 1
gtid_domain_id = <Must be unique number different from other nodes e.g. 12>
Note that server-id and gtid_domain_id must be unique. For instance, if your master nodes use server-id 01 with gtid_domain_id 10 or 11, assign different values to the replica.
Next, start the podman service:
systemctl daemon-reload
systemctl start podman-mariadb-slave.service
Tip: If the service fails to start, start the podman container “manually” to debug:
podman run \
--name podman-mariadb-slave \
-p 3308:3306 \
-v /srv/podman/mariadb/conf/replica_slave.cnf:/etc/mysql/conf.d/replica.cnf:Z \
-v /srv/podman/mariadb/data:/srv/podman/mariadb/data:Z \
-v /srv/podman/mariadb/log:/srv/podman/mariadb/log:Z \
localhost/mariadb:10.11.10 \
--datadir=/srv/podman/mariadb/data
I use the same data directory path in the container because mariabackup, which runs outside the container, will look for this directory – so there’s no need to override it separately in the config.
Connect to MariaDB in the container and start replication, first with binlog position:
mysql -h 127.0.0.1 -P 3308 -u <admin user> -p
CHANGE MASTER TO
MASTER_HOST = '<CLUSTER IP>',
MASTER_PORT = 30306,
MASTER_USER = 'replicant',
MASTER_PASSWORD = '<password>',
MASTER_LOG_FILE = 'master-bin.000160', # xtrabackup_binlog_pos_innodb
MASTER_LOG_POS = 858584413, # xtrabackup_binlog_pos_innodb
MASTER_USE_GTID = NO;
START SLAVE;
# wait 20-30 sec then stop
STOP SLAVE;
## change to GTID
CHANGE MASTER TO MASTER_USE_GTID = slave_pos;
# start again
START SLAVE;
This starts the replication, which you can verify with the query below:
SHOW SLAVE STATUS\G
If you see “Slave_IO_Running: Yes” and “Slave_SQL_Running: Yes”, your replica is up and running.
Creating a backup strategy depends on individual preferences, but I’ll present my own approach.
In the case of the Galera cluster, I schedule the backup for nighttime hours when the load is lower, so only one full backup is made per day. On the replication server, however, I can vary it more freely, as it doesn’t interfere with the operation of the cluster; here, backups are possible every 4-6 hours, or even hourly.
I chose 6 backups per day, for a frequency of 4 hours. I store these backups on local storage for three days. I don’t need to take care of long-term data retention here, as the backups are backed up separately to the dedicated backup server every day.
Here are the backup script steps in a nutshell:
With this solution, we’ve made backing up the MariaDB Galera cluster much safer and more flexible. By introducing a dedicated read-only replica, we can now run consistent backups as often as every four hours without putting any load on the active cluster nodes. This eliminates the risk of synchronization lag or database “freezes”.
The benefits go beyond just backups. Because the replica is a real-time mirror of production, it’s the perfect place to run heavy reports or queries, keeping production fast and responsive.
Finally, this setup is vital for business continuity. If a critical error occurs, the replica can be promoted to a live production database in seconds, drastically reducing downtime. This makes the dedicated replica an essential part of any Disaster Recovery plan.
Although setting up the firewall rules and Podman containers takes some initial effort, the payoff is a stable, scalable, and fully automated system.
Useful links:
Using MariaDB Replication with MariaDB Galera Cluster
Global Transaction ID
Podman with Quadlet
Maria-backup
Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth IT Italy.