Restoring the cluster after changing the master nodes IP addresses

1. Problem statement

There may be a situation where the IP addresses change for all master nodes of the Openshift cluster. This can happen when the DHCP server, for some reason, changes the reserved IP addresses assigned to the master nodes. A static IP address is critical for the etcd component of the Openshift cluster. After rebooting the master nodes with the new IP addresses, the system will fail to boot because the etcd components will expect to receive the reserved IP addresses assigned to the master nodes during the installation of the Openshift cluster.

2. Solution

This problem can be solved in two ways. The first method involves statically changing the IP addresses for the master nodes. After rebooting, the system will resume functioning. The second method is an extension of the first, except that it allows the master nodes to receive dynamic IP addresses reserved by the DHCP server.

3. Required tools and access for issue resolution

To resolve the issue of changing IP addresses for the master nodes, SSH access to the master nodes and DHCP server is required.

4. Master nodes recovery procedure

4.1. Method I

The step-by-step procedure for restoring the master nodes is as follows:

  1. Log in via SSH to the DHCP instance.

  2. Stop the DHCP server with the command: systemctl stop isc-dhcp-server.

  3. Create a backup copy of the file /var/lib/dhcp/dhcp.lease.

  4. Log in via SSH to one of the cluster’s master nodes.

  5. Find the relevant network interface using the next command:

    # nmcli device|grep ovs-interface
    br-ex ovs-interface *connected* ovs-if-br-ex
  1. Verify that it is the correct network interface. Its IP address should match the current network settings, and the only difference should be in IP4.ADDRESS[1]

    # nmcli device show *br-ex*
    IP4.ADDRESS[1]:                         10.9.1.235/24
  1. Change the instance’s IP address to a static address using the following command, replacing <IP ADDRESS> with the desired cluster IP address:

    # nmcli connection modify br-ex ipv4.addresses <IP ADDRESS>/24
  1. Verify that the network is working by executing any network command, such as checking the availability of the gateway with the 'ping' command:

    # ping 10.9.1.1
  1. Reboot the master node.

  2. After rebooting, ensure that the master node is functioning using the correct IP address.

  3. Verify that all Openshift service pods are running on this node by executing the following command:

    # crictl ps
  1. Verify that the Openshift cluster API is accessible after performing the above steps.

  2. Repeat the procedure for the remaining master nodes to finalize restoring the cluster.

4.2. Method II

  1. The second method includes steps 1-12 from [The first method].

  2. Start the DHCP server with the command:

    # systemctl start isc-dhcp-server.
  1. Follow the documentation to perform the etcd member replacement (master nodes) in the following order:

  2. Sequentially replace the next two non-working etcd members (master nodes).

  3. Verify that all three master nodes are functioning normally using the following command:

    $ oc get nodes
    NAME                         STATUS   ROLES    AGE   VERSION
    mdtuddm-2n5bl-master-0       Ready    master   25h   v1.20.0+87cc9a4-1079
    mdtuddm-2n5bl-master-1       Ready    master   25h   v1.20.0+87cc9a4-1079
    mdtuddm-2n5bl-master-2       Ready    master   25h   v1.20.0+87cc9a4-1079
  1. Exclude the master node with the static address. After it transitions to the Not Ready state, perform the replacement of this etcd member (master node) according to the documentation.

  2. Verify that all three master nodes are functioning normally. Remove the excluded master node with the static IP address.