In my previous blog post, we modified the boot parameters to enable cgroups v2 and the user namespace in CRI-O. In this second part I’ll show you how to run a sample container with systemd and check that the modifications we made actually worked.
To test the new config, let’s use a simple Docker with systemd enabled, based on CentOS.
The container can be deployed with a standard manifest:
--- apiVersion: apps/v1 kind: Deployment metadata: name: systemd-deployment labels: app: systemd spec: replicas: 1 selector: matchLabels: app: systemd template: metadata: labels: app: systemd annotations: io.kubernetes.cri-o.userns-mode: "auto" spec: serviceAccountName: systemd-test automountServiceAccountToken: True containers: - name: systemd-test image: registry.access.redhat.com/ubi9:latest command: ["/sbin/init"] securityContext: allowPrivilegeEscalation: False capabilities: drop: - ALL runAsNonRoot: False
Note that we add the flag
runAsNonRoot: False since systemd inside the container is executed as root inside the container (but as mentioned before, it’s mapped to a non-root uid outside of the container namespace), and the annotation
io.kubernetes.cri-o.userns-mode: "auto" to enable the CRI-O user namespace. To simplify testing, instead of creating a Docker image with systemd, we picked RedHat’s UBI image and overrode the container’s entry point (
command: ["/sbin/init"]) to load the necessary services.
First let’s get the name of the pod and some information on which node the pod is running:
oc get pods -n default -o wide | grep systemd systemd-deployment-5698997785-45tzz 1/1 Running 0 4m50s 10.128.2.94 node04 <none> <none>h
and then log in to the container:
oc -n default rsh systemd-deployment-5bfd6fdb56-2ftq4
ps util to check the details about the processes running:
sh-5.1# dnf install -y procps
As we can see, inside of the container we are the root user:
sh-5.1# whoami root
and we can see that the init process inside the container is running with PID 1 as root:
sh-5.1# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 15:24 ? 00:00:00 /sbin/init root 18 1 0 15:24 ? 00:00:00 /usr/lib/systemd/systemd-journald root 29 0 0 15:24 pts/0 00:00:00 sh root 57 29 0 15:29 pts/0 00:00:00 ps -ef
and that systemd is running:
sh-5.1# systemctl status ● systemd-deployment-5bfd6fdb56-2ftq4 State: running Jobs: 0 queued Failed: 0 units Since: Mon 2022-10-03 15:24:41 UTC; 17h ago CGroup: / ├─init.scope │ ├─ 1 /sbin/init │ ├─ 29 sh │ ├─277 /bin/sh │ ├─291 systemctl status │ └─292 "(pager)" └─system.slice ├─dbus-broker.service │ ├─64 /usr/bin/dbus-broker-launch --scope system --audit │ └─65 dbus-broker --log 4 --controller 9 --machine-id 4b9d5f875116426badd8e681f903b8f3 --max-bytes 536870912 --max-fds 4096 --max-matches 16384 --audit └─systemd-journald.service └─18 /usr/lib/systemd/systemd-journald
Now, let’s check the sandbox in the host system: we want to verify that we are running as an unprivileged user on the host.
So first let’s get the container’s ID in the OpenShift node where the pod is running:
sudo crictl ps | grep systemd CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID 8158b7ccd8ae1 registry.access.redhat.com/ubi9@sha256:c40e515aaebf3da366419d4eae3f0a9fe95ef88f4b942b7cf8ce421010e3969c 15 hours ago Running systemd-test 0 5dafb65aeccb1
and then we’ll inspect the sandbox to check that the user really is unprivileged and to get the PID of the running container:
sudo crictl inspect 8158b7ccd8ae1 | jq '.info.privileged, .info.pid' false 1837568
Now we can check how the pid and the uid of the process running in the container are mapped to the host’s namespace:
sudo pgrep --ns 1837568 | xargs ps -o pid,uid,cmd PID UID CMD 1837568 165536 /sbin/init 1837700 165536 /usr/lib/systemd/systemd-journald 1837981 165536 sh 1889217 165617 /usr/bin/dbus-broker-launch --scope system --audit 1889222 165617 dbus-broker --log 4 --controller 9 --machine-id 4b9d5f875116426badd8e681f903b8f 3315216 165536 /bin/sh
As we can see, the processes running in the container are mapped to a non-privileged pid/uid in the host system.
We were able to run a shell as root user within the container namespace, while being a non-privileged user outside of the container. In a future blog post we’ll investigate how to further limit the SCC that we’ve used up to now (
anyuid) to a more specific SCC, and investigate how to remove more capabilities.
This is still an experimental approach, but for these initial tests it seems to work: your mileage may vary. And thanks again to Marco Caimi from RedHat for their support.
Did you find this article interesting? Are you an “under the hood” kind of person? We’re really big on automation and we’re always looking for people in a similar vein to fill roles like this one as well as other roles here at Würth Phoenix.