06. 10. 2022 Lorenzo Candeago DevOps

My OpenShift Journey #4: Run Unprivileged Containers with systemd in OpenShift: Part 1 – Deployment

For our ongoing transition from Jenkins to OpenShift, we’re currently working on porting our testing infrastructure to OpenShift.

Our tests involve installing and running our product, NetEye, in a container. The installation requires a working systemd environment inside the container, and systemd needs to run with PID 1 and as root user (UID 0). Until now we’ve been able to do this by running the testing container as privileged.

Now, with OpenShift, we would like to be able to run the container as unprivileged, but drop as many capabilities as possible while still being able to run NetEye correctly.

We are currently running OpenShift Container Platform 4.11. For the work described in this blog post we followed the instructions provided by Marco Caimi, solution architect at RedHat. We also took inspiration from previous work on the subject proposed in a blog post by Fraser Tweedale.

In this first blog post I’ll show you the config changes needed on the OpenShift side. In the next one I’ll show you how to test my proposed solution.

Solution Outline

By default, OpenShift uses cgroups v1. After contacting RedHat’s support, they suggested we enable cgroups v2 and user namespace in CRI-O (Container Runtime Interface). Cgroups v2 allows us to have root privileges inside the container, but not outside it. To do this, we’ll exploit a feature of cgroups v2 and CRI-O: namely we will map an unprivileged uid outside of the container to the root uid inside the container, and we won’t use the privileged SCC (security context constraint). The only SCC required will be anyuid.

Use cgroups v2 and User Namespace

Since user namespace is not enabled by default in CRI-O, (see cri-o/userns.md at main – cri-o/cri-o), we need to enable it with a machine config:

cat /etc/crio/crio.conf.d/99-user-namespace-workload.conf:

[crio.runtime.workloads.userns]
activation_annotation = "io.kubernetes.cri-o.userns-mode"
allowed_annotations = ["io.kubernetes.cri-o.userns-mode"]

Furthermore, by default the kernel is booted with cgroups v1, so we want to add the following to the kernel boot parameters:

cgroup_no_v1=all psi=1 systemd.unified_cgroup_hierarchy=1

We’ll use the following MachineConfig configuration to create the necessary changes on master nodes:

---
#enable_user_ns.yml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: 99-enable-cgroupv2-on-masters
  labels:
    machineconfiguration.openshift.io/role: master
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      # see https://github.com/cri-o/cri-o/blob/main/tutorials/userns.md
      - path: /etc/crio/crio.conf.d/99-workload-userns.conf
        overwrite: true
        contents:
          source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS53b3JrbG9hZHMudXNlcm5zXQphY3RpdmF0aW9uX2Fubm90YXRpb24gPSAiaW8ua3ViZXJuZXRlcy5jcmktby51c2VybnMtbW9kZSIKYWxsb3dlZF9hbm5vdGF0aW9ucyA9IFsiaW8ua3ViZXJuZXRlcy5jcmktby51c2VybnMtbW9kZSJdCg==
        mode: 420
  kernelArguments:
    - systemd.unified_cgroup_hierarchy=1
    - cgroup_no_v1="all"
    - psi=1

And then deploy the changes:

oc create -f ~/enable_user_ns.yml

We can check that the config has been created:

oc get machineconfig | grep 99-enable-cgroupv2-on-masters
99-enable-cgroupv2-on-masters 3.2.0 8d

and that the changes have been deployed by monitoring the status of the MachineConfingPools and then waiting until the master nodes are updated.

oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-c2ac180ae7398bc7cec20106a2e41cbe   True      False      False      3              3                   3                     0                      107d
worker   rendered-worker-aedd42003621dc4c437a98e3c157a1fd   True      False      False      2              1                   2                     0                      107d

We can also check the boot parameters of the master nodes so we can be sure that the configuration has been deployed and the machine has been rebooted:

cat /proc/cmdline BOOT_IMAGE=(hd0,gpt3)/ostree/ [...] systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=1

Create a Service Account with anyuid SCC

The only SCC permission required is the anyuid one. For testing purposes, let’s create a service account systemd-test in the default namespace with the correct permissions:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: systemd-test
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: 'system:openshift:scc:anyuid'
  namespace: default
subjects:
  - kind: ServiceAccount
    name: systemd-test
    namespace: default 
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: 'system:openshift:scc:anyuid'

In my next blog post I’ll show you how to run a test container with the newly modified configuration and account, and how to verify that the modification was successful.

These Solutions are Engineered by Humans

Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.