25. 08. 2023 Lorenzo Candeago DevOps

Debug and Workarounds for a Stuck Update on OpenShift 4.13.6

Today we wanted to update our OpenShift cluster, and after a while we came up against the following error:

Not good…

Let’s start by checking the clusterversion to investigate if we can find any errors:

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.6    True        True          91m     Unable to apply 4.13.9: an unknown error has occurred: MultipleErrors

By analyzing the logs further we found the following error:

oc get clusterversion -o yaml

[...]
    - lastTransitionTime: "2023-08-25T13:55:14Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator kube-apiserver is updating versions
        * deployment openshift-etcd-operator/etcd-operator is not available MinimumReplicasUnavailable (Deployment does not have minimum availability.) or progressing ProgressDeadlineExceeded (ReplicaSet "etcd-operator-74cc7479b7" has timed out progressing.)
      reason: MultipleErrors
      status: "True"
      type: Failing

it seems like the openshift-etcd-operator is not able to deploy the necessary pods. Let’s investigate it further and see which pods are running in the openshift-etcd-operator:

oc get pods -n openshift-etcd-operator
NAME                             READY   STATUS                 RESTARTS   AGE
etcd-operator-74cc7479b7-h8t64   0/1     CreateContainerError   0          97m

OK, what’s going on with the pod?

 oc describe pod etcd-operator-74cc7479b7-h8t64  -n openshift-etcd-operator
[...]
Warning  Failed          96m (x3 over 97m)      kubelet            (combined from similar events): Error: container create failed: time="2023-08-25T12:25:04Z" 
level=error msg="runc create failed: unable to start container process:
unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524"

According to the RedHat release notes, it seems that the issue

is due to a CoreOS limitation in the number of seccomp profiles that can be created on the worker node.

and is related to the following bug: https://issues.redhat.com/browse/OCPBUGS-2637.

Bad news. It seems like a kernel bug related to the architecture amd64.

After some further searching we’ve been able to pin down the issue to the specific OpenShift version 4.13.6 https://issues.redhat.com/browse/OCPBUGS-16655.

The bug still isn’t solved, but at least there’s a suggested workaround that has to be run on all worker nodes:

sudo sysctl status net.core.bpf_jit_limit=364241152

After that, the update seems to be working again!

Lorenzo Candeago

DevOps Engineer at Würth Phoenix

Author

Lorenzo Candeago

DevOps Engineer at Würth Phoenix

Latest posts by Lorenzo Candeago

24. 07. 2024 DevOps

How to Add SSH Keys to ArgoCD and Tekton on OpenShift to Access Gitea: Part 3 – A Simple Tekton TaskRun

17. 07. 2024 DevOps

How to Add SSH Keys to ArgoCD and Tekton on OpenShift to Access Gitea: Part2 – Add an SSH Key as a Secret to ArgoCD and Run a Test Deployment

17. 07. 2024 DevOps

How to Add SSH Keys to ArgoCD and Tekton on OpenShift to Access Gitea: Part1 – Set up the Test Environment and Add SSH Key to Gitea

30. 10. 2023 DevOps

How to Fix OpenShift Console not Showing the Characters in Firefox

30. 10. 2023 DevOps

LVM Disks on Azure for Dockerized Applications

See All

Debug and Workarounds for a Stuck Update on OpenShift 4.13.6

Lorenzo Candeago

Author

Lorenzo Candeago

Latest posts by Lorenzo Candeago

Leave a Reply Cancel reply

Search by technology

Contact

Subscribe to blog

Categories

Recent posts

Archive

Debug and Workarounds for a Stuck Update on OpenShift 4.13.6

Lorenzo Candeago

Author

Lorenzo Candeago

Latest posts by Lorenzo Candeago

Related Content

How to Add SSH Keys to ArgoCD and Tekton on OpenShift to Access Gitea: Part 3 – A Simple Tekton TaskRun

How to Add SSH Keys to ArgoCD and Tekton on OpenShift to Access Gitea: Part2 – Add an SSH Key as a Secret to ArgoCD and Run a Test Deployment

How to Add SSH Keys to ArgoCD and Tekton on OpenShift to Access Gitea: Part1 – Set up the Test Environment and Add SSH Key to Gitea

OpenShift: How to Check and Reset Ceph Storage in Warning State

How We Want to Avoid Breaking the NetEye User Guide (Again)

Leave a Reply Cancel reply

Search by technology

Contact

Subscribe to blog

Categories

Recent posts

Archive