Prevent CVE exploits in your Kubernetes cluster with seccomp and SELinux profiles

The other day I was reading a cool post by Snyk in which the author exploits a known CVE in ImageMagick to get a reverse shell and I started wondering what would it take to write policies that would prevent this attack.

This would also be a nice demonstration a project I was contributing to recently
called the Security Profiles Operator (SPO) which enables you to easily distribute and use security profiles in a Kubernetes cluster. While it was possible to use profiles like seccomp or SELinux in Kubernetes clusters for some time, the distribution and usage part is something new that SPO enables which would be a manual and tedious process otherwise.

Note that the security policies are always a security benefit and should be used for any pods, but they are even more important for privileged pods. Your cluster should ideally not run any privileged pods and if it does, they should be constrained. This is unfortunately not the case in many Kubernetes distributions.

The original blog by Snyk post comes with a Github demo repository which I forked and put all the manifests in. You can follow the original repo to play around with the exploits locally using docker or podman and you can use the fork to follow this blog post along.

You’ll need:

  • a kubernetes cluster, ideally with admin access to try all the scenarios including the one that deploys SPO. I used a vanilla 1.20 cluster running Fedora 33 as cluster nodes running on libvirt-powered VMs. If your cluster is running another Linux distribution, you might not be able to test the SELinux profiles, but the seccomp part should be testable from everywhere.
  • another machine reachable from the cluster to test the reverse shell. For the demo, I used another VM just to really test that the exploit can reach outside the cluster, but you can even use another cluster node.
  • a local clone of my goof-container-breaking-in demo repo

Try out the exploit in Kubernetes

Let’s first verify we can exploit an image running in a cluster to make sure we have something tangible to defend against. Because the exploit starts a reverse shell, we’re going to prepare the listening endpoint that will just sit there and wait for the vulnerable pod to be attacked, which would initiate the connection from the container and start the reverse shell. Clone the demo repo and make sure the payload includes the IP address of this listening machine.

As said before, I was testing with a local VM, running Fedora. Log in to the machine and make sure netcat is instaled:
$ sudo dnf -y install nc

Then, start the listener on a port that matches the port set in the exploit payload:
$ nc -l 3131
Type some command (ls is good enough), the command will just hang at the moment until you run the exploit in the vulnerable pod.

Now it’s time to run the vulnerable payload in the cluster. From the checkout of the repo, run:
$ kubectl create -f k8s/goof.yaml
to create the pod. For the sake of simplicity, we’re deploying the app in the same namespace as the operator.

The pod is not very complex:

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    app: security-profiles-operator
  name: security-profiles-operator
---
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    app: goof
  name: goof
  namespace: security-profiles-operator
spec:
  containers:
  - image: quay.io/jhrozek/goof-container:latest
    name: goof
    ports:
    - containerPort: 3112

I didn’t bother setting up services for the demo, but instead just exposed the container of the pod with port-forward:
$ kubectl port-forward pod/goof -nsecurity-profiles-operator 8080:3112

Navigate to http://localhost:8080/public using your browser and you should see the image resizing app. Click Browse, select the payload (exploits/nc-reverse-shell.jpg) and click Resize to submit it. The exploit now does its thing and runs our code, which fetches nc and connects to our listener VM. At this point, or a couple seconds later, we should see the “ls” command returning output.

The reverse shell might seem as a toy, but it’s actually useful to perform lateral movement in the cluster. Depending on how the pod and the cluster are (mis)configured, there are interesting things you can achieve. Try some other commands in the shell to get a feel of what you can do (hint: a service account’s token is a good start).

We’ve established that the image can be exploited. Delete the pod now:
$ kubectl delete pods goof -nsecurity-profiles-operator

Preventing the attack

So how do you prevent mischiefs like this one? There are several things you can do and you should actually employ a combination of them, for defense in depth:

  • detect vulnerable images and prevent them from being deployed. The original blog post from Snyk talks about this. Of course this presumes that you know that the image is vulnerable. In case someone decides to exploit an unknown vulnerability or a configuration issue, this won’t help.
  • detect and prevent the reverse shells. Sysdig had a nice blog post about that some time ago although I haven’t tried it myself.
  • restrict the app to only do what it’s supposed to do. Even if the app is exploited, the potential to do damage is limited. This can be done using several techniques, including limiting the container capabilities or using security profiles like seccomp or SELinux.
  • I guess many more…

In the rest of the post, we’ll concentrate on restricting the app to only do that it’s supposed to do using both seccomp and SELinux and as importantly, making sure those policies can be used as first-class cluster objects so that we can list the policies installed in the cluster, view their status, see which workloads use which policy and so on.

The seccomp and SELinux policies are really complementary. While seccomp restricts the syscall surface of an application, SELinux allows you to restrict what kind of objects can the app interact with based on labels on both processes and files. For example, if your app needs to write somewhere, you’ll permit the usage of the write syscall, but using SELinux, you’ll be able to express that the app can only write to files labeled with a certain label and no others. Same applies to e.g. devices that might need to be mounted to privileged containers, using SELinux you can restrict the app to only communicate with the devices labeled wth the appropriate label.

Deploy SPO

Follow the official documentation of SPO to install it don’t forget to install cert-manager first.

The central part of the SPO configuration is the “spod” CRD, which will eventually show you that SPO is up and running:

$ kubectl get spod -nsecurity-profiles-operator
NAME STATE
spod RUNNING

Prevent the attack with a seccomp profile

Let’s first start with the seccomp profile as more people are probably familiar with seccomp and virtually any Kubernetes cluster allows you to deploy seccomp profiles.

Of course, we need to generate the policy first. The ideal way is the generate the profile in the environment that is very close to where your app is actually running – and the SPO offers a feature that does just that, through the ProfileRecording CRD. This feature requires CRI-O 1.21.0 or newer and in addition your worker nodes must have the oci-seccomp-bpf-hook package installed. Because this is not the case in all available clusters, we’ll show both the ProfileRecording and how to generate the profiles locally as well,

Finally, if you’re not interested in generating the policies, feel free to use the one in the repo (k8s/seccomp-policy.yaml) and just skip to the section where we use the policy.

Generate the seccomp profile in-cluster

To generate the policy, create a ProfileRecording CR that matches the goof app we’re running:

apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: ProfileRecording
metadata:
   # The name of the Recording is the same as the resulting seccomProfile CR
   name: goof
   namespace: security-profiles-operator
spec:
   kind: SeccompProfile
   podSelector:
      matchLabels:
         app: goof

Next, create the app again:
$ kubectl create -f k8s/goof.yaml
and verify that the pod is annotated with "io.containers.trace-syscall/goof". This annotation tells the CRI-O runtime that it should record the syscalls the container does.

Now forward the ports again:
$ kubectl port-forward pod/goof -nsecurity-profiles-operator 8080:3112
navigate to the app as earlier and convert an ordinary image, not the exploit. Once you’re done, remove the goof pod:
$ kubectl delete pods goof -nsecurity-profiles-operator

In a bit, a policy should be automagically defined in your cluster and you’ll be able to see it with:

$ kubectl get sp goof -nsecurity-profiles-operator
NAME STATUS AGE
goof Installed 10m

including the details:
$ kubectl get sp goof -nsecurity-profiles-operator -oyaml

Now you can remove the profilerecording object:
$ kubectl delete profilerecordings goof -nsecurity-profiles-operator

Generate the seccomp policy locally

To make the article usable also for people who can’t run the new version of CRI-O or install the oci-seccomp-bpf-hook in their clusters, let’s also generate the policy locally. I generated the profile locally with podman using an OCI seccomp BPF hook as described here.

Generating the profile locally is of course not ideal – you’ll ideally want to generate the profile in the same or very similar environment to the one you’ll be running the app in as the profile needs to include both the syscalls your app is doing as well as what the container runtime is doing. I had to adjust the podman-generated policy a bit to include the setuid() call in my case.

So to generate the profile, let’s start the app container locally, but tell podman to record the syscalls using the seccomp bpf hook. That can be done using the io.containers.trace-syscall annotation:
$ sudo podman run --annotation io.containers.trace-syscall="of:/tmp/goof-seccomp-normal.json" --rm -p 3112:3112 --name goof quay.io/jhrozek/goof-container

Navigate to http://localhost:3113/public and convert a normal image to get the baseline seccomp policy. Let’s do a similar thing again, just recording into a different file and this time let’s feed the app the exploit (exploits/nc-reverse-shell.jpg) again:
$ sudo podman run --annotation io.containers.trace-syscall="of:/tmp/goof-seccomp-rsh.json" --rm -p 3112:3112 --name goof quay.io/jhrozek/goof-container

Note that we run the containers using "sudo podman". This is needed to be allowed to record the policies.

Now if we compare the two policies:
$ diff <(jq -c '.syscalls[].names[]' < /tmp/goof-seccomp-normal.json) <(jq -c '.syscalls[].names[]' < /tmp/goof-seccomp-rsh.json)

we see that one generated using the exploit contains several more syscalls:

2a3 > "alarm"
10a12
> "connect"
18a21,23
> "fchmodat"
> "fcntl"
> "fork"
25a31,32
> "getpeername"
> "getpgrp"
28a36
> "getrusage"
29a38,39
> "getsockopt"
> "gettid"
40a51
> "newfstatat"
41a53
> "pipe"
45a58
> "prlimit64"
47a61
> "recvfrom"
62a77,78
> "statfs"
> "symlink"
66a83,85
> "unlink"
> "unlinkat"
> "utimensat"
68a88


Presumably connect is what is used to establish the connection outside and fchmodat was used to make the netcat executable after the exploit downloaded netcat. So this is already looking promising!

Now you’d convert the policy to the SeccompProfile object definition. To save you from the work, you can just head to the goof github repo and use the seccomp-profile-binding.yaml file:
$ kubectl create -f k8s/seccomp-profile-binding.yaml
Verify that the profile was created, also peek at the securityprofilenodestatuses objects to see the status of the policy per nodes:
$ kubectl get securityprofilenodestatuses,seccompprofiles -nsecurity-profiles-operator
All the policy statuses should say “Installed”.

Test that the seccomp profile prevents the attack

Now it’s time to put our policy to use! We could either reference our policy directly in the pod manifest, but that’s not very scalable. What we probably want instead is to make sure all workloads with a certain image have a certain policy applied. This is when the ProfileBinding CRD comes to play. The binding
object binds an image to a profile reference:

apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
kind: ProfileBinding
metadata:
   name: goof-binding
   namespace: security-profiles-operator
spec:
   profileRef:
      kind: SeccompProfile
      name: goof
   image: quay.io/jhrozek/goof-container:latest

Now, let’s deploy the vulnerable pod again and forward the ports:
$ kubectl create -f k8s/goof.yaml

Verify that the policy was applied – check the pod manifest in the
cluster:
$ kubectl get pods goof -nsecurity-profiles-operator -ojsonpath='{.metadata.annotations}' | jq
and you should see the pod annotated with the seccomp policy on-disk path:
"container.seccomp.security.alpha.kubernetes.io/goof": "localhost/operator/security-profiles-operator/goof.json"
Similarly, the SeccompProfile object keeps tracks of the workloads that use the profile so that you can’t remove a profile in use by accident:
$ kubectl sp goof -nsecurity-profiles-operator -ojsonpath='{.status.activeWorkloads}'
["security-profiles-operator/goof"]

Finally, let’s run the exploit again:
$ kubectl port-forward pod/goof 8080:3112
Navigate to localhost:8080, click browse, select the payload (exploits/nc-reverse-shell.jpg) and click Resize to submit it.

…and nothing! The attack was prevented by the seccomp profile and your company won’t be on the front page of any IT newspaper tomorrow for all the wrong reasons.

Let’s clean up after our experiments so that we can move on to the next one with SELinux.

$ kubectl delete pods goof -nsecurity-profiles-operator
$ kubectl delete profilebinding --all -nsecurity-profiles-operator
$ kubectl delete sp -nsecurity-profiles-operator --all

Prevent the attack with SELinux profile

In some distributions, like OpenShift, SELinux is used by default to make sure that pods are really isolated and contained from each other. There are some really good blog posts by Dan Walsh or my team mate Ozz.

This post has already gotten way longer than I anticipated, so I’ll just show how to create and use a policy without going too deep into details about how SELinux works in Kubernetes. If you peek at Ozz’s post, you might notice that the workflow includes generating a policy locally and then distributing the policy file to each of the cluster nodes. Ouch, this doesn’t sound very scalable and easy. Do you now need to ssh into the nodes and manage the policies yourself?

To address this concern, Ozz and I wrote a small project called selinuxd which is now used by SPO and handles installation and removal of the policies for you. This means that with SPO’s and selinuxd’s help, installing an SELinux policy is as easy as kubectl create. I’m tactfully omitting the part where you need to create your own policy: at the moment, we still don’t have a good answer except generate the policy locally, but we have ideas for future 🙂

First, let’s enable SELinux in SPO:
$ kubectl patch spod spod -p '{"spec":{"enableSelinux": true}}' --type=merge
This triggers rollout of the DaemonSet called “spod” which might take up to a couple of minutes (copying and using the selinux policies is slow..).

Generate the SELinux policy locally

As said earlier, there is no way to generate the policy in cluster at the moment,
although the feature is planned. So we’ll resort to generating the policy locally, with podman and udica. Run the container:
$ podman run --rm -p 3112:3112 --name goof quay.io/jhrozek/goof-container
and in another terminal, inspect the container and point udica at the JSON file that podman inspect generates:
$ podman inspect a0fa31b6a538 > /tmp/goof.json
$ sudo udica -j /tmp/goof.json goof-selinux

This will generate a .cil file that we will wrap in a SelinuxProfile CR. You can view the result in the repo (k8s/selinuxpolicy-goof.yaml).

Test that the policy prevents the attack

You can use the prepared policy in the repo (k8s/selinuxpolicy-goof.yaml). Create it as a usual k8s object:
$ kubectl create -f k8s/selinuxpolicy-goof.yaml

Installing the policy takes a bit of time during which selinuxd on the hosts does its job, so let’s wait until the policy switches to installed:

$ kubectl get selinuxprofile
NAME USAGE STATE
goof goof_security-profiles-operator.process Pending
$ kubectl wait --for=condition=ready selinuxprofile goof
selinuxprofile.security-profiles-operator.x-k8s.io/goof condition met
$ kubectl get selinuxprofile
NAME USAGE STATE
goof goof_security-profiles-operator.process Installed

Now that the policy is installed, we can take it into use. The policy name (called usage) is different from the profile object name, which can be confusing, but either way, the policy name can be found at .status.usage:
$ kubectl get selinuxprofile goof -ojsonpath='{.status.usage}'
goof_security-profiles-operator.process

We can put this string into the pod’s .spec.containers.securityContext.seLinuxOptions:

spec:
  containers:
  - image: quay.io/jhrozek/goof-container:latest
    name: goof
    ports:
    - containerPort: 3112
    securityContext:
      seLinuxOptions:
        type: goof_security-profiles-operator.process

The full pod manifest using the policy can be found at k8s/goof-selinux.yaml in the repo. So let’s start the pod:
$ kubectl create -f k8s/goof-selinux.yaml

Forward the ports, set up the listener and run the exploit again:
$ kubectl port-forward pod/goof -nsecurity-profiles-operator 8080:3112

Navigate to localhost:8080, click browse, select the payload (exploits/nc-reverse-shell.jpg) and click Resize to submit it.

…and nothing! If we go to the node and inspect its audit.log, we’ll see that the pod tried to do something that the policy didn’t allow it to:

type=AVC msg=audit(1621934085.650:196): avc: denied { name_connect } for pid=58536 comm="nc" dest=3131 scontext=system_u:system_r:goof_security-profiles-operator.process:s0:c285,c1001 tcontext=system_u:object_r:unreserved_port_t:s0 tcl>

Aha! This is the nc binary trying to connect to a port, which the policy does not allow. SELinux saved the day.

Conclusion

The Security Profiles Operator doesn’t bring any new hardening techniques on its own. Both SELinux and seccomp were supported by container runtimes and Kubernetes for some time. What SPO does bring is the ability to treat the security profiles as first-class objects including status, record workloads
to create the security profiles without having to analyze the program line by line and allows the administrator to bind the workloads to the profiles.

The SPO upstream had created documents with user stories and personas that hopefully illustrates better who is SPO aimed for.

As can be seen from the github issues, the SPO development is far from over. If you’re interested in the world of usable container security, please come over to the #security-profiles-operator channel on the
Kubernetes slack, try the operator and give feedback.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s