secure_workloads_kubernetes

6 June 2023

Kubernetes is an Infrastructure as a Service (IaaS) platform that makes it easy to manage containerized workloads. However, it poses new security challenges, as Kubernetes users are responsible for both the applications running in clusters and the underlying infrastructure.

Why should you care about the security of your workloads?

Kubernetes workloads are often exposed on the internet, which is why they are interesting for attackers to gain initial access to a Kubernetes infrastructure. Unfortunately, no web framework is free of bugs or vulnerabilities, and the same goes for Docker images used to package applications.

Therefore, it is essential to harden the security of workloads to prevent the compromise of one application from leading to a full compromise of the underlying Kubernetes infrastructure.

This article aims to provide recommendations to mitigate these risks. These recommendations are mainly taken from the NSA's "Kubernetes Hardening Guide" and from the Kubernetes security documentation. It does not cover Role-Based Access Control (RBAC) in Kubernetes.

Securing Pods

Securing Pods reduces the overall attack surface in the cluster and prevents many post-exploitation activities after a compromise.

Run applications in containers as non-root users

Why run applications as non-root?

Most container services run applications as root users by default. Yet, most of the time, applications do not require such high permissions.

Running applications as non-root users mitigates the impact of a container compromise, as it limits the rights these applications have in the containers. In addition, container engines (the services that manage containers on a Kubernetes node) are also prone to security flaws that can break the isolation between containers and the host system.

In this scenario, an attacker that managed to compromise a container (exploiting an application vulnerability for instance) could escape from it and end up with the same privileges on the host system as the one he has in the container.

How to change the user in containers?

There are two ways to run containers with a non-root user:

  • Specifying a non-root user to use in the Dockerfile

    # Create a new user (user1) and new group (group1); then switch into that user’s context
    RUN useradd user1 && groupadd group1
    USER user1:group1
  • Using Pod security contexts to specify a non-root user at runtime

    apiVersion: v1
    kind: Pod
    metadata:
       name: my-secure-pod
    spec:
       ...
       securityContext:
          runAsUser: 1001
          runAsNonRoot: true

Specifying the user in the Dockerfile is to be preferred, as it ensures the container will always be run without root rights by container engines.

Immutable container file systems

Why use an immutable container file system?

An attacker that compromised an application and gained execution rights can create, download files or modify applications. Kubernetes can lock down the container's file system to prevent many post-exploitation activities.

⚠️ Note that these restrictions also affect legitimate applications running in containers and can result in crashes or abnormal behaviors.

How to apply these limitations?

To enable these limitations, the property securityContext.readOnlyRootFilesystem need to be set to true in the container specification.

apiVersion: v1
kind: Pod
metadata:
   name: my-secure-pod
spec:
   ...
   securityContext:
      readOnlyRootFilesystem: true

Build and run secure images

Why does it matter?

Building secure container images in the first place helps reduce the security flaws attackers can leverage to gain access to containers. In addition, using minimal images with only the necessary services allows for reducing the attack surface and the tools attackers can leverage from within a container.

What can I do to build secure images?

Two mechanisms can be leveraged to build secure images:

  • Use minimal base images, like scratch (when possible) or alpine-based images
  • Scan images to detect outdated dependencies and libraries, known vulnerabilities, or misconfigurations. To be the most efficient, scans need to be performed at build time, at rest on the container registry images on a regular schedule, and at run time on the images of active workloads.

To make sure the Docker images run by Pods are secure, it is possible to only accept signed images in the cluster (using GKE built-in Binary Authorization or a dedicated admission controller for instance) that come from trusted repositories and that passed vulnerability checks.

Restrict capabilities

What are capabilities?

Linux divides the privileges traditionally associated with superuser into distinct units, called capabilities, which can be independently enabled and disabled. These capabilities are checked in kernel system calls to test whether a program can or cannot do privileged operations. Kubernetes container runtime (containerd by default) is a privileged process that spawns containers with a set of predefined capabilities.

Why restrict capabilities?

The privileged operations an attacker can perform inside a container are limited by the capabilities the container process has. Thus, many post-exploitation activities can be prevented by restricting them.

How to restrict the capabilities of containers?

Container capabilities can be limited by the parameter securityContext.capabilities. It is good practice to disable all capabilities using the keyword all and then only add the ones that are strictly necessary.

apiVersion: v1
kind: Pod
metadata:
   name: my-secure-pod
spec:
   ...
   securityContext:
		  capabilities:
				 drop: ["all"]
		     add:
				   - MKNOD
			     - NET_RAW

⚠️ Note that legitimate programs need capabilities to perform operations, and so restricting them can impact applications.

Use Seccomp profiles

What is Seccomp?

Seccomp is a Linux kernel feature that allows restricting the system calls a program can do. It can be used to sandbox the privileges of a process.

Why use Seccomp?

The privileged operations an attacker can perform inside a container are limited by the system calls he is allowed to do. Many post-exploitation activities can be prevented by limiting them.

How to use Seccomp profiles?

The Seccomp profiles to apply on the Pod containers can be configured with the property securityContext.seccompProfile. Kubernetes allows specifying two kinds of profiles:

  • RuntimeDefault which allows using the default seccomp profile provided by the container engine. Nodes can be configured to apply it on all containers by default. containerd can apply the following default profile. Cloud providers usually activate it automatically on nodes.

    apiVersion: v1
    kind: Pod
    metadata:
       name: my-secure-pod
    spec:
       ...
       securityContext:
          seccompProfile:
             type: RuntimeDefault
  • Localhost which allows using a profile loaded on the node host. The profile can be loaded beforehand using a DaemonSet for instance.

    apiVersion: v1
    kind: Pod
    metadata:
       name: my-secure-pod
    spec:
       ...
       securityContext:
          seccompProfile:
             type: Localhost
             localhostProfile: "profiles/my-custom-profile.json"

Use AppArmor or SELinux

What is AppArmor?

AppArmor is a Linux kernel security module that can restrict the capabilities of running processes and limit their access to files. With this module, each process can have its own security profile.

If you want more information on AppArmor, this article explains how to build AppArmor profiles specifically for Docker containers.

Why use AppArmor?

AppArmor allows restricting the activities an attacker can perform in a container by limiting the Linux capabilities (AppArmor is redundant with Kubernetes’ capabilities feature) of the container process and file access in the container.

How to apply AppArmor profiles?

The module first needs to be activated on the nodes’ OS (cloud providers usually provide optimized OS for Kubernetes that can ship with AppArmor). In addition, to use custom profiles, profiles need to be loaded on the node using a DaemonSet for instance.

An AppArmor profile can be applied to a container, adding the annotation container.apparmor.security.beta.kubernetes.io/<container_name> to the Pod's metadata. Kubernetes allows applying two types of profiles:

  • runtime/default to apply the runtime default profile (see containerd default profile). containerd automatically apply this profile by default when AppArmor is enabled on the node.

    apiVersion: v1
    kind: Pod
    metadata:
       name: my-secure-pod
    	 annotations:
    		 container.apparmor.security.beta.kubernetes.io/my_container: runtime/default
    spec:
       ...
  • localhost/<profile_name> to apply a profile that was loaded on the node beforehand

    apiVersion: v1
    kind: Pod
    metadata:
       name: my-secure-pod
    	 annotations:
    		 container.apparmor.security.beta.kubernetes.io/my_container: localhost/my-profile
    spec:
       ...
What about SELinux?

SELinux can also be used to secure Pods through the securityContext property. SELinux provides similar functionalities as AppArmor. However, it is considered harder to learn but more secure than AppArmor.

Note that AppArmor and SELinux cannot be used at the same time on a system.

Protect Pod service account tokens

Why pay special attention to Pod service accounts?

Service accounts are critical elements of Kubernetes infrastructure, as they are used by Pods to interact with the Kubernetes API. Thus, it is important to only grant them the necessary rights and to never rely on default service accounts that are auto-mounted in all Pods by default.

What to keep in mind?
  • Never use default service accounts unless you have no other choice

  • Respect the principle of least privilege, and only grant the necessary permissions to custom service accounts

  • Disable service account auto-mount in pods that do not need to interact with the Kubernetes API. The auto-mount can be disabled by using the property automountServiceAccountToken.

    apiVersion: v1
    kind: Pod
    metadata:
       name: my-secure-pod
    spec:
       ...
    	 automountServiceAccountToken: false

Pod security enforcement

What is Pod security enforcement?

Enforcing Pods’ security consists in preventing Pods not respecting the baseline security policy defined beforehand from running in the cluster. It allows making sure workloads run in Pods that respect part of the above security criteria.

Why enforce a baseline security policy for Pods?

Pod security enforcement ensures that no Pods with potential vulnerabilities can be deployed in the cluster and result in the

How to implement Pod security enforcement?

Kubernetes has a built-in Pod Security Admission Controller (since Kubernetes 1.23) that checks the compliance of Pod specifications with pre-defined Pod Security Standards (privileged, baseline, or restricted) that define different isolation levels for Pods. It only requires annotating namespaces to define a Pod security standard level to perform compliance checks. The controller can either enforce the policy and reject all Pods that violate the policy, or audit the Pod's compliance with the policy and trigger the addition of an audit log if a Pod’s specification violates it. For instance, to enforce the restricted policy:

apiVersion: v1
kind: Namespace
metadata:
	name: my-secure-namespace
	annotations:
		# pod-security.kubernetes.io/<MODE>: <LEVEL>
		pod-security.kubernetes.io/enforce: restricted

Instead of this controller, Policy controllers like Kyverno can be used to perform similar checks. These controllers also allow mutating Pods to directly add security constraints when during resource admission.

Isolating workloads

Isolating workloads is essential to limit lateral movement within clusters, and to prevent a compromised workload from impacting other workloads.

Use Namespace segregation

Why are Namespaces important?

Kubernetes Namespaces allow a logical partition of cluster resources. Namespaces do not automatically isolate workloads and applications, but numerous resources apply to the scope of Namespaces.

Especially, the resources mentioned in the next part apply to Namespaces to isolate them from each other and to isolate workloads within those spaces. Roles and RoleBindings used for RBAC also apply to the Namespace scope.

Use NetworkPolicies

Why use NetworkPolicies?

Traffic between Pods, Namespaces, and external IP addresses can be controlled with NetworkPolicies. By default, there is no restriction for ingress and egress traffic in the cluster. Thus, without NetworkPolicies, an attacker that compromises a container is able to request all other Pods and Services to potentially move laterally within the cluster.

How to use NetworkPolicies?

NetworkPolicies require a Kubernetes network plugin that supports them (for instance Calico). To secure as much as possible your cluster, the best practice is to respect the principle of least privilege by only authorizing legitimate network flows. Network policies do not conflict since they are additive.

The good practice is to deny all ingress traffic in the cluster.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-ingress
spec:
  podSelector: {}
  policyType:
    - Ingress

And then to allow only the necessary traffic in the cluster with other network policies. It is possible to go even further by restricting the egress traffic by denying all egress traffic from pods and only allowing necessary communications.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-egress
spec:
  podSelector: {}
  policyType:
    - Egress

Example of Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: example-access-nginx
  namespace: prod
spec:
  podSelector:
    matchLabels:
      app: nginx
  ingress:
    -from:
      -podSelector:
        matchLabels:
          access: "true"

Add ResourceQuotas for requests and limits

Why set ResourceQuotas?

ResourceQuotas allow restricting the sum of resources Pods in a Namespace can request. If they are used to limit compute resources, all Pods must define requests and limits, otherwise, the quota system may reject Pod creation. ResourceQuotas help avoid resource exhaustion, especially to prevent user applications to monopolize resources and prevent Kubernetes system Pods from running correctly.

How to use ResourceQuotas?

It is good practice to use ResourceQuotas to:

  • Force every container to specify requests and limits for resources.
  • Restrict the aggregated request and limits of pods in a namespace.
apiVersion: v1
kind: ResourceQuota
metadata:
 name: my-secure-namespace
spec:
 hard:
   requests.cpu: "1"
   requests.memory: 1Gi
   limits.cpu: "2"
   limits.memory: 2Gi

LimitRange resources can also be used to set some default requests and limits on Pods if not specified and limit the ranges of these requests and limits at the Pod scope.

Conclusion

All these recommendations take quite some time to implement. You will probably not follow all of them, depending on your needs. However, some of them are quick wins that can highly improve the security of your cluster and your workloads.

Especially, building secure Docker images, taking extra care with Service Accounts as well as using default Seccomp and AppArmor profiles is a good first step.

However, taking extra precautions to secure workloads does not guarantee that an attacker will never find a vulnerability in your applications or cluster. Therefore, monitoring your cluster for anomalies that may result from a compromise is also critical to securing your infrastructure in depth.

For example, Falco is an intrusion detection tool that integrates well with Kubernetes to provide such functionality.