Replace the first master node of the worker cluster¶

This page will take a highly available three-master-node worker cluster as an example. When the first master node of the worker cluster fails or malfunctions, how to replace or reintroduce the first master node.

This page features a highly available cluster with three master nodes.

node1 (172.30.41.161)
node2 (172.30.41.162)
node3 (172.30.41.163)

Assuming node1 is down, the following steps will explain how to reintroduce the recovered node1 back into the worker cluster.

Preparations¶

Before performing the replacement operation, first obtain basic information about the cluster resources, which will be used when modifying related configurations.

Note

The following commands to obtain cluster resource information are executed in the management cluster.

Get the cluster name

Execute the following command to find the clusters.kubean.io resource corresponding to the cluster:

# For example, if the resource name of clusters.kubean.io is cluster-mini-1
# Get the name of the cluster
CLUSTER_NAME=$(kubectl get clusters.kubean.io cluster-mini-1 -o=jsonpath="{.metadata.name}{'\n'}")

Get the host list configmap of the cluster

kubectl get clusters.kubean.io cluster-mini-1 -o=jsonpath="{.spec.hostsConfRef}{'\n'}"
{"name":"mini-1-hosts-conf","namespace":"kubean-system"}

Get the configuration parameters configmap of the cluster

kubectl get clusters.kubean.io cluster-mini-1 -o=jsonpath="{.spec.varsConfRef}{'\n'}"
{"name":"mini-1-vars-conf","namespace":"kubean-system"}

Steps¶

Adjust the order of control plane nodes

Reset the node1 node to restore it to the state before installing the cluster (or use a new node), maintaining the network connectivity of the node1 node.

Adjust the order of the node1 node in the kube_control_plane, kube_node, and etcd sections in the host list (node1/node2/node3 -> node2/node3/node1):

function change_control_plane_order() {
  cat << EOF | kubectl apply -f -
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mini-1-hosts-conf
namespace: kubean-system
data:
hosts.yml: |
    all:
    hosts:
        node1:
        ip: "172.30.41.161"
        access_ip: "172.30.41.161"
        ansible_host: "172.30.41.161"
        ansible_connection: ssh
        ansible_user: root
        ansible_password: dangerous
        node2:
        ip: "172.30.41.162"
        access_ip: "172.30.41.162"
        ansible_host: "172.30.41.162"
        ansible_connection: ssh
        ansible_user: root
        ansible_password: dangerous
        node3:
        ip: "172.30.41.163"
        access_ip: "172.30.41.163"
        ansible_host: "172.30.41.163"
        ansible_connection: ssh
        ansible_user: root
        ansible_password: dangerous
    children:
        kube_control_plane:
        hosts:
            node2:
            node3:
            node1:
        kube_node:
        hosts:
            node2:
            node3:
            node1:
        etcd:
        hosts:
            node2:
            node3:
            node1:
        k8s_cluster:
        children:
            kube_control_plane:
            kube_node:
        calico_rr:
        hosts: {}
EOF
}

change_control_plane_order

Remove the first master node in an abnormal state

After adjusting the order of nodes in the host list, remove the node1 in an abnormal state of the K8s control plane.

Note

If node1 is offline or malfunctioning, the following configuration items must be added to extraArgs, you need not to add them when node1 is online.

reset_nodes=false # Skip resetting node operation
allow_ungraceful_removal=true # Allow ungraceful removal operation

# Image spray-job can use an accelerator address here

SPRAY_IMG_ADDR="ghcr.m.daocloud.io/kubean-io/spray-job"
SPRAY_RLS_2_22_TAG="2.22-336b323"
KUBE_VERSION="v1.24.14"
CLUSTER_NAME="cluster-mini-1"
REMOVE_NODE_NAME="node1"

cat << EOF | kubectl apply -f -
---
apiVersion: kubean.io/v1alpha1
kind: ClusterOperation
metadata:
  name: cluster-mini-1-remove-node-ops
spec:
  cluster: ${CLUSTER_NAME}
  image: ${SPRAY_IMG_ADDR}:${SPRAY_RLS_2_22_TAG}
  actionType: playbook
  action: remove-node.yml
  extraArgs: -e node=${REMOVE_NODE_NAME} -e reset_nodes=false -e allow_ungraceful_removal=true -e kube_version=${KUBE_VERSION}
  postHook:
    - actionType: playbook
      action: cluster-info.yml
EOF

Manually modify the cluster configuration, edit and update cluster-info

# Edit cluster-info
kubectl -n kube-public edit cm cluster-info

# 1. If the ca.crt certificate is updated, the content of the certificate-authority-data field needs to be updated
# View the base64 encoding of the ca certificate:
cat /etc/kubernetes/ssl/ca.crt | base64 | tr -d '\n'

# 2. Change the IP address in the server field to the new first master IP, this document will use the IP address of node2, 172.30.41.162

Manually modify the cluster configuration, edit and update kubeadm-config

# Edit kubeadm-config
kubectl -n kube-system edit cm kubeadm-config

# Change controlPlaneEndpoint to the new first master IP,
# this document will use the IP address of node2, 172.30.41.162

Scale up the master node and update the cluster

Note

Use --limit to limit the update operation to only affect the etcd and kube_control_plane node groups.
If it is an offline environment, spec.preHook needs to add enable-repo.yml, and the extraArgs parameter should fill in the correct repo_list for the related OS.

cat << EOF | kubectl apply -f -
---
apiVersion: kubean.io/v1alpha1
kind: ClusterOperation
metadata:
  name: cluster-mini-1-update-cluster-ops
spec:
  cluster: ${CLUSTER_NAME}
  image: ${SPRAY_IMG_ADDR}:${SPRAY_RLS_2_22_TAG}
  actionType: playbook
  action: cluster.yml
  extraArgs: --limit=etcd,kube_control_plane -e kube_version=${KUBE_VERSION}
  preHook:
    - actionType: playbook
      action: enable-repo.yml  # This yaml needs to be added in an offline environment,
                               # and set the correct repo-list (install operating system packages),
                               # the following parameter values are for reference only
      extraArgs: |
        -e "{repo_list: ['http://172.30.41.0:9000/kubean/centos/\$releasever/os/\$basearch','http://172.30.41.0:9000/kubean/centos-iso/\$releasever/os/\$basearch']}"
  postHook:
    - actionType: playbook
      action: cluster-info.yml
EOF

This completes the replacement of the first Master node.

Replace the first master node of the worker cluster¶

Preparations¶

Steps¶

Comments