policy-as-code

3 August 2023

Policy as Code (PaC) is an increasingly popular methodology in DevOps. It addresses the need for control over ever more quickly evolving infrastructures. This article will give you guidance on how to build a PaC system that encourages teams to adopt it while not affecting their productivity.

First, some definitions

What is a Policy?

Policies are rules that ensure the appropriate, efficient, and secure use of an organization's information technology resources. We can distinguish two types of policies:

  • Organizational policies that define frameworks and procedures to guarantee quality, system security, or compliance with regulations. They’re often defined by security and compliance teams.
  • Technical policies that translate and apply company policies to IT systems. Policies differ from configuration in that they are intended to apply to an element that is external to (or at least different from) the component evaluating the policy.

For example, an organizational policy might be “All deployments must be validated by the QA and security team before deployment in production”. Firewall policies are, as for them, examples of technical policies.

What is Policy as Code and what are its benefits?

Policy As Code (PaC) is the process of translating policies into a piece of code. Both organizational and technical policies can be converted into PaC.

Th benefits of Policy as Code are similar to those of Infrastructure as Code:

  • Visibility: Defining policies with code offers an opportunity to provide a technical measurement of compliance with a policy or procedure. The policy evaluation result can then be used as evidence during an audit.
  • Automation: Policy enforcement has traditionally involved manually applied procedures and audits to verify compliance with policies. This process can be time-consuming and error-prone, particularly in large and complex environments. Configuration pieces may be scattered across different files, systems, or code repositories in these environments. By using PaC, it is possible to design a system that will automatically and continuously evaluate policies.
  • Consistency: PaC makes reusing policies across multiple environments and infrastructures easy. Compared with manual policy enforcement, it also reduces the risk of human error and security risks.
  • Fast deployment: By reducing the manual steps in applying policies, PaC enables new versions of policies to be deployed more easily and frequently. The GitOps approach can also be applied to policies as code, allowing for automated deployment of updates.
  • Versioning: Version Control Systems (VCS) allow managing and controlling policy versions, making it easier to maintain and track changes over time. This ensures that policies remain consistent and reliable, even as they evolve and change.

How to implement Policy As Code?

Generic principle

Policy as Code’s operating principle is quite simple. It relies on a main component that we will call a policy engine. This engine is in charge of running a query on a set of policies and data to provide an answer.

Queries generally ask whether the input data conforms to the policy or requests a set of validated or non-validated policies. The answer of the policy engine, then, may be used to authorize an action or to generate a report on policy compliance.

policy-engine

The policy engine usually defines the policy language used to create policies. In practice, some data will often be passed simultaneously as the query, while some external data sources will be queried to handle complex policy validation.

Some Policy As Code implementations

Open Policy Agent (OPA) is an open-source policy agent backed by the Cloud Native Computing Foundation. It uses its own policy language: Rego.

Another agent worth mentioning is Sentinel, with some smooth integration with Terraform Cloud and Vault. However, while having an easy-to-read and comprehensible language, this solution still suffers from a closed-source ecosystem, and its language may not be as extended as OPA’s.

Kyverno is a policy agent for Kubernetes focused on security. In a previous article, we presented its capabilities and how it can help secure a Kubernetes cluster.

Enough theory, let's get practical

A simple example

An interesting use case for Policy As Code is policy validation in the CD pipeline. In this example, we will see how to force the use of a GitOps approach on Terraform Cloud, using the Sentinel policy engine integrated into the platform.

Let’s consider a policy composed of two rules:

  • Ops team must deploy infrastructure using a dedicated CD platform to prevent having admin privileges on cloud platform
  • Any deployed code base must be tracked in Version Control System (e.g. Github).

The enforcement of the first rule is made possible by using Terraform Cloud and delegating it the rights to deploy infrastructure. To validate our second requirement, we will need some policy, and that’s when Sentinel comes into play.

Sentinel gives you the opportunity to validate policies between the plan and the apply of code on Terraform Cloud. It can be setup to block apply operation if the policies are not validated:


workspace-sentinel

To translate our policy into code, we will use the Sentinel language, a dedicated language for Sentinel. The Sentinel integration in Terraform Cloud exposes some data sources we can use in our policy code.

The [tfrun](<https://developer.hashicorp.com/terraform/cloud-docs/policy-enforcement/sentinel/import/tfrun>) object contains metadata about the workspace that is being applied. We can query its vcs_repo attribute to validate that a repository is linked to the workspace that is requested to be applied:


# tfrun contains information about the workspace that has been planned
import "tfrun"
import "strings"

# Checks if plan has been done on Terraform Cloud 
is_remote_run = rule {
	tfrun.workspace.execution_mode == "remote"
}

# Checks if workspace is linked with a vcs repository
is_using_vcs = rule {
	tfrun.workspace.vcs_repo is defined
}

# Checks if terraform workspace is using vcs on corporate repo
main = rule {
	is_remote_run and is_using_vcs
}

The idea here is to give you an idea of the possibilities offered by Policy As Code. We won't go into detail about writing or adding policies to Terraform Cloud. If you'd like to deploy the following example, I'll let you refer to the editor's tutorials, which are pretty well done.

When applying a workspace not linked to a repository, the main rule is returning false. Sentinel would be able to block the planned operation on this workspace:

vcs-repo

Note that the policy is actually running in advisory mode. In this case, Sentinel is not blocking the apply operation but gives visibility on the application of policies.

Going further

The value of policy engines lies partly in their ability to query external systems to feed their decision-making. To complete the previous example, it would be possible to enrich the information on the VCS repository used, by querying the Terraform Cloud and Github APIs. This would enable to ensure that the repository is located in the company's Github organization or that a code quality workflow has been applied to the code.


sentinel-to-github

By combining data sources, you’ll be able to build complex policies that closely match your business policies and your company processes. For example, to implement organizational processes that require validation of an individual, it may be useful to request ticket management systems such as Jira or Trello.

However, interconnecting a large number of systems can cause problems if one of the data sources becomes unavailable. The policy engine may need help to decide on a policy, which could disrupt the workflow of the ops team.

Five best practices when implementing Policy As Code

As we’ve just seen, we might end up facing some pitfalls when implementing Policy As Code. In this last paragraph, I’ll advise you on how to avoid them based on our experience at Padok.

  • Start small: A common pitfall when implementing Policy As Code is to try to enforce every policy as soon as possible. Instead, it is recommended to start with a small perimeter and focus on the most relevant policies.

    Several Policy As Code implementations allow setting policies in audit mode, preventing them from actually blocking any process they are tasked to validate. It is usually a good practice to start with this mode and iterate to improve developer experience and fix unexpected issues.

  • Use specialized tools as external data source: Reinventing the wheel is another common pitfall when implementing Policy As Code. For instance, if you want to validate code quality in your policy, it may be a good idea to query the results of a dedicated tool such as Checkov rather than re-implementing them.

  • Test your policies: It is essential to anticipate that a data source required for the policy may not be available. The policy engine itself may no longer be available or may contain errors. Testing your policies is highly recommended to reduce the risk related to these issues.

  • Provide bypass procedure: In case of policy engine failure or during a production incident, it may become vital to bypass the policy engine. It is important that this case is anticipated, and a procedure must be defined to ensure the traceability of bypassing actions.

  • Provide intelligible output and train your team to understand them: The biggest challenge when implementing Policy As Code is getting teams to adopt new and potentially more restrictive practices. Policy engine’s output must be verbose enough to guide teams towards understanding the conditions under which the policies are validated. Setting up policies in non-blocking mode may also be sufficient to give visibility on security practices while not blocking ops team workflow.

You should now have a pretty clear idea of what Policy As Code is. I hope that the use cases presented and the best practices have been able to guide you in implementing your system.