9 May 2023
Kubernetes is a critical component of infrastructures, making good security practices mandatory. But the Kubernetes control plane does not offer the possibility to define strict security policies. Kyverno is for us the best tool to impose security rules.
In the whole article, we will use the vocabulary associated with Kyverno resources: Policy, Rule, ...
Kyverno is a policy engine for Kubernetes. It allows to :
- Define policies as Kubernetes resources;
- Validate, modify, or generate resources on the fly via these policies;
- Block non-compliant resources with an admission controller;
- Log policy violations in reports.
- Define security policies to prohibit the creation of insecure resources;
- Simplify the life of Ops via on-the-fly resource mutations;
- Possibility to configure policies in audit mode (without blocking) or enforce;
- Simple policy writing (compared to GateKeeper in particular)
- Difficult to create policies with very specific and/or complex logic;
- Kyverno is a Single Point Of Failure. Some people know the dark side of admission controllers: if Kyverno pods are no longer available, no more Kubernetes resources can be deployed on the cluster. I'll give you some tips to avoid this problem in the following article.
Kyverno runs as a dynamic admission controller in the Kubernetes cluster.
The Kyverno webhook receives requests from the API server during the "validating admission" and "mutating admission" steps:
Policy & Rule
A Kyverno Policy is composed of the following fields (for more info:
kubectl explain policy.spec) :
rules: one or more rules define the policy
- background : if true, the policy applies to all existing Kubernetes resources in the cluster,
otherwiseit applies only to new resources
validationFailureAction: the action mode of the policy: audit or enforce
A Rule contains the following fields (for more info:
kubectl explain policy.spec.rules):
match: to select the resources
exclude(optional): to exclude resources from the selection
verifyImages: depending on the type of policy allows to mutate, validate, generate a resource, or verify the signature of an image (in beta)
Audit vs Enforce
Kyverno has 2 modes of operation (
- audit: does not block any deployment, but generates a report indicating when the specified policies are not respected and why
- enforce: completely blocks the creation of resources that do not respect the policies
Policy Reports are Kubernetes resources that can be listed simply:
kubectl get policyreport -A
For a given namespace, we can list policy violations with the command :
kubectl describe polr polr-ns-default | grep "Result: \\+fail" -B10
Kyverno can be installed on clusters via a simple Helm chart. Nothing could be simpler, that's the power of Kubernetes:
kelm repo add kyverno https://kyverno.github.io/kyverno/ helm repo update helm install kyverno --namespace kyverno --create-namespace --values values.yaml
Here are the important points to consider in the chart
--- # 3 replicas for High Availability replicaCount: 3 # Necessary in EKS with custom Network CNI plugin # https://cert-manager.io/docs/installation/compatibility/#aws-eks hostNetwork: true config: webhooks: # Exclude namespaces from scope - namespaceSelector: matchExpressions: - key: kubernetes.io/metadata.name operator: NotIn values: - kube-system - kyverno - calico-system # Exclude objects from scope - objectSelector: matchExpressions: - key: webhooks.kyverno.io/exclude operator: DoesNotExist
Some remarks about the installation :
- Access to the host network is required if you use EKS
- Kyverno must be configured with at least 3 replicas to ensure high availability
- The namespaces
kyvernoare whitelisted in order not to block the deployment of critical Kubernetes resources (kube-proxy, weave, ...).
Example of policy
A list of simple examples is provided in the Kyverno documentation.
I'd like to present a slightly more advanced use case: dynamic RBAC rights management. Here is the use case we encountered. We set up on-the-fly development environments in Kubernetes at a customer's site.
We allowed developers, via a Gitlab CI job, to test their applications in environments created on the fly. These environments are in dedicated namespaces also created on the fly.
How do you provide the associated Gitlab runner with RBAC rights to namespaces that don't yet exist? Unfortunately, Kubernetes does not allow this via RBAC, but with Kyverno, it is very simple.
All you need to do is:
- Give the runner the RBAC rights to create namespaces
- To give RBAC rights on this namespace via a Kyverno Policy: a Mutation Policy can simply create a RoleBinding in reaction to the namespace creation
Here are the implementation details:
- The k8s service account gitlab-runner-ephemeral-env is only allowed to create namespaces
apiVersion: v1 kind: ServiceAccount metadata: name: gitlab-runner-ephemeral-env labels: app: gitlab-runner-ephemeral-env --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: gitlab-runner-ephemeral-env labels: app: gitlab-runner-ephemeral-env rules: - apiGroups: ["*"] resources: ["namespaces"] verbs: ["create"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: gitlab-runner-ephemeral-env labels: app: gitlab-runner-ephemeral-env roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: gitlab-runner-ephemeral-env subjects: - kind: ServiceAccount name: gitlab-runner-ephemeral-env namespace: gitlab
- When a namespace is created, a rolebinding is created between it and the ClusterRole
cluster-adminvia a ClusterPolicy Kyverno
apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: add-rbac-rules-env-volee annotations: policies.kyverno.io/title: Add RBAC permissions for ephemeral environments. policies.kyverno.io/category: Multi-Tenancy policies.kyverno.io/subject: RBAC policies.kyverno.io/description: >- Add RBAC rules when a namespace is created by a specific gitlab runner (gitlab-runner-env-volee), useful for ephemeral environments. spec: background: false rules: - name: create-rbac match: resources: kinds: - Namespace subjects: - kind: ServiceAccount name: gitlab-runner-ephemeral-env namespace: gitlab generate: kind: RoleBinding name: ephemeral-namespace-admin namespace: "" synchronize: true data: subjects: - kind: ServiceAccount name: gitlab-runner-ephemeral-env namespace: gitlab roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io
Limitations of Kyverno
I will detail in this part several problems encountered when implementing Kyverno. Besides the fact that Kyverno is a SPOF on all the namespaces it monitors, the policies are quite complicated to write and debug. Not to mention that Kyverno can have side effects with other tools like ArgoCD.
Policies are complex to write
Overall, Kyverno policies can be quite difficult to write. The documentation has many examples, but the whole mechanism of filtering and mutating resources can be a bit confusing at first.
Let's take a live example. We want to disallow the
privileged: true parameter except for two types of pods (as shown in the following diagram):
- Pods in the
- Pods in the
gitlabnamespace whose name starts with
Following the documentation, we are tempted to write the following policy:
apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: disallow-privileged-containers annotations: policies.kyverno.io/category: Pod Security Standards (Baseline) policies.kyverno.io/severity: medium policies.kyverno.io/subject: Pod policies.kyverno.io/description: >- Privileged mode disables most security mechanisms and must not be allowed. This policy ensures Pods do not call for privileged mode. spec: validationFailureAction: audit background: true rules: - name: priviledged-containers match: resources: kinds: - Pod exclude: any: - resources: namespaces: - "debug" # Whitelisting - resources: namespaces: - "gitlab" names: - "runner-*" validate: message: >- Privileged mode is disallowed. The fields spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged must not be set to true. pattern: spec: =(initContainers): - =(securityContext): =(privileged): "false" containers: - =(securityContext): =(privileged): "false"
This policy does not work, the filtering mechanism is not effective. After some research, here is the fix to apply:
18,20c18,21 < resources: < kinds: < - Pod --- > all: > - resources: > kinds: > - Pod
There is no indication in the documentation of a change in behavior between these two ways of filtering resources. Not easy to debug a policy that doesn't work... fortunately, the community is active, and someone quickly proposed the solution on Slack..
Beware of Mutation Webhooks
From experience, one should always be careful with Webhook Mutation, which can be confusing for DevOps teams. Kubernetes Webhook Mutations inherently induce a difference between the specified resources and the resources actually deployed on the cluster.
If an Ops is not aware of the existence of these mutations, they can waste a lot of time understanding why a particular resource appears or has certain attributes.
Similarly, if a cluster has too many MutationPolicies, there may be incompatibilities between policies, or edge effects that are difficult to identify.
I recommend using Webhook Mutations sparingly and documenting them very clearly. This can be extremely useful (e.g. adding the address of an HTTP proxy as an environment variable for all pods in a namespace), but it is best to avoid abusing it if possible.
Side effects with ArgoCD
We have also encountered some difficulties with Kubernetes clusters whose CD is managed via ArgoCD.
When a Kyverno policy is created that relates to a resource that deploys containers, such as pods, Kyverno intelligently modifies the
rules so that the policies take into account all types of Kubernetes resources that deploy containers.
For example, if we create this policy:
apiVersion : kyverno.io/v1 kind: ClusterPolicy metadata: name: restrict-image-registries spec: validationFailureAction: enforce rules: - name: validate-registries match: any: - resources: kinds: - Pod validate: message: "Images may only come from our internal enterprise registry." pattern: spec: containers: - image: "registry.domain.com/*"
Kyverno will modify the policy on the fly via a Webhook Mutation like this:
aspec: background: true failurePolicy: Fail rules: - match: any: - resources: kinds: - Pod name: validate-registries validate: message: Images may only come from our internal enterprise registry. pattern: spec: containers: - image: registry.domain.com/* - match: any: - resources: kinds: - DaemonSet - Deployment - Job - StatefulSet name: autogen-validate-registries validate: message: Images may only come from our internal enterprise registry. pattern: spec: template: spec: containers: - image: registry.domain.com/* - match: any: - resources: kinds: - CronJob name: autogen-cronjob-validate-registries validate: message: Images may only come from our internal enterprise registry. pattern: spec: jobTemplate: spec: template: spec: containers: - image: registry.domain.com/* validationFailureAction: enforce
What happens if the Kyverno policy was created via Argo? Argo will detect a change between the Yaml file of the declared policy and the resource actually deployed in the cluster. There is then a constant back and forth between Argo and Kyverno, which modify the Kyverno policy in turn.
To indicate to Argo that these changes are not to be taken into account, it is sufficient to use the
ignoreDifferences keyword in the Argo application:
ignoreDifferences: # Kyverno auto-generates rules to make policies smarter. We want ArgoCD to # ignore the auto-generated rules. # For more information: https://kyverno.io/docs/writing-policies/autogen/ - group: kyverno.io kind: ClusterPolicy jqPathExpressions: - .spec.rules | select( .name | startswith("autogen-") )
Now you know what Kyverno is, how to install it, and how to use it to secure your Kubernetes cluster! Once again, use Webhook Mutation sparingly, test your policies well in audit mode beforehand, and don't hesitate to contact the community in case of problems.