02. 07. 2019 Benjamin Gröber NetEye, Unified Monitoring

How To Recover from a DRBD9 Metadata Split Brain Situation

As soon as you manage more than a few DRBD resources distributed over a wide set of hardware, split brain situations cannot always be avoided. Standard split brains are caused by multiple nodes having different opinions about the latest state of the data on their local disk.


Disclaimer: If applied incorrectly, commands in this blog post may potentially cause data loss. If you are unsure about any step here, please be sure to back up your data first.

Standard problematic cluster situations are commonly resolved by disconnecting and discarding the data of the faulty node, while reconnecting to the primary node as outlined below.

Standard Split Brain

In the following example we will assume a two node setup with primary.example.com being the node with good data in primary state, and faulty.example.com being the node which needs to be “fixed”.

Before proceeding with either procedure, please make sure that your primary node contains the copy of the data you want to keep!

[root@faulty]# drbdadm disconnect $RESOURCE_NAME
[root@faulty]# drbdadm --discard-my-data connect $RESOURCE_NAME:primary.example.com

[root@primary]# drbdadm connect $RESOURCE_NAME:faulty.example.com

This is the standard procedure, however sometimes this does not resolve the split brain in all cases. Sometimes the so-called “metadata”, used by DRBD to keep track of its own actions, gets corrupted.

Metadata Split Brain

If you find yourself in the situation that, after following the aforementioned procedure, the disk is still “Inconsistent” or the connection between nodes doesn’t advance further than the “Connecting” state, you’re most probably a victim of metadata corruption.

In this case you’ll need to invalidate the DRBD resource on the “faulty” node, which will overwrite the local data with data from its peers and recreate the metadata from scratch, such that after the procedure it synchronizes again with the state of the local data and remote nodes.

[root@faulty]# drdbdadm invalidate $RESOURCE_NAME
[root@faulty]# drdbdadm down $RESOURCE_NAME
[root@faulty]# drdbdadm create-md $RESOURCE_NAME
[root@faulty]# drdbdadm adjust $RESOURCE_NAME

[root@primary]# drdbdadm adjust $RESOURCE_NAME

As mentioned in earlier posts, I strongly suggest that you use the drbdtop tool, available from the neteye-extras repository. You can use it to supervise and analyze the progress and state of the drbd resources both during and after the recovery process.

Benjamin Gröber

Benjamin Gröber

R&D Software Architect at Wuerth Phoenix
Hi, my name is Benjamin, and I'm Software Architect in the Research & Development Team of the "IT System & Service Management Solutions" Business Unit of Würth Phoenix. I discovered my passion for Computers and Technology when I was 7 and got my first PC. Just using computers and playing games was never enough for me, so just a few months later, started learning Visual Basic and entered the world of Software Development. Since then, my passion is keeping up with the short-lived, fast-paced, ever-evolving IT world and exploring new technologies, eventually trying to put them to good use. I'm a strong advocate for writing maintainable software, and lately I'm investing most of my free time in the exploration of the emerging Rust programming language.

Author

Benjamin Gröber

Hi, my name is Benjamin, and I'm Software Architect in the Research & Development Team of the "IT System & Service Management Solutions" Business Unit of Würth Phoenix. I discovered my passion for Computers and Technology when I was 7 and got my first PC. Just using computers and playing games was never enough for me, so just a few months later, started learning Visual Basic and entered the world of Software Development. Since then, my passion is keeping up with the short-lived, fast-paced, ever-evolving IT world and exploring new technologies, eventually trying to put them to good use. I'm a strong advocate for writing maintainable software, and lately I'm investing most of my free time in the exploration of the emerging Rust programming language.

Leave a Reply

Your email address will not be published. Required fields are marked *

Archive