argo_workflow

7 March 2024

Running a task on all GitHub repositories in order to apply best practices at scale or to implement security processes like secret detection is a common use case that you may come across. Unfortunately, GitHub does not offer the features to do this natively. Argo Workflows can help solve this issue.

What is Argo Workflows?

Argo Workflows is a workflow orchestrator on Kubernetes, where job specifications are implemented as Kubernetes CRDs. It benefits from the power of Kubernetes to run parallel tasks in containers.

Architecture

Argo Workflows is composed of two main components:

  • The Argo Server provides a UI from which users can check the status of their jobs and create workflows.
  • The Workflow Controller that watches Argo Workflows CRDs. Essentially, it creates pods to run jobs based on the specification of a Workflow. The pod is always created in the same namespace as the Workflow that defines the associated task.

argo_workflow_architecture

Argo Workflows CRDs

Argo Workflows offers the following CRDs:

  • Workflow: a Workflow represents a job that must be run. It also stores the state of the corresponding job. It is the most important resource in Argo Workflows.
  • WorkflowTemplate: a WorkflowTemplate is a template for a Workflow, so that it can be reused by multiple Workflows.
  • ClusterWorkflowTemplate: this resource serves the same purpose as WorkflowTemplates, except it is not namespace-scoped. In multi-tenant clusters, it is useful to have a template shared by all tenants.
  • CronWorkflow: a CronWorkflow is a Workflow that runs on a preset schedule.
  • WorkflowEventBinding: this resource is used by the Argo Server to create Workflows in response to events. It is the resource that is used to bind a GitHub webhook to an Argo Workflow.

How to interface Argo Workflows with GitHub webhooks?

Architecture

GitHub webhooks can be integrated with Argo Workflows with the architecture below.

github_webhooks_architecture

In this scenario, after a GitHub event is triggered:

  1. The Argo Server received an HTTP request with the webhook payload from GitHub.
  2. The Argo Server looks for the WorkflowEventBinding that is linked to the event.
  3. The Argo Server creates a Workflow following the specification of the WorkflowEventBinding.
  4. The Workflow Controller watches the new Workflow.
  5. The Workflow Controller creates a Pod to run the workflow.

Step-by-step example: run a job after a GitHub push event

1 - Install Argo Workflows

Argo Workflows can be installed using Helm (see Argo Workflows Helm Chart).

For a minimal installation that will work with GitHub webhooks we can use the following Helm values:

workflow:
	# create the default service account for workflows (with no rights)
	serviceAccount:
		create: true

controller:
	workflowDefaults:
		spec:
			# associate a default service account to workflows
	    serviceAccountName: argo-workflow
	# namespaces where the controller look for workflows
	workflowNamespaces: ["workflows"]
	workflowRestrictions:
		# only processes Workflows using workflowTemplateRef, to prevent arbitrary worklow creation
		templateReferencing: Secure

server:
	authModes:
		# required to authorize webhook events
		- client
  ingress:
		enabled: true
		# ✏️ configure your own ingress here
    hosts: argo.xxx.com

2 - Authorize the GitHub webhook to trigger workflows in Argo Workflows

Once the Workflows controller and the Argo server are installed, the next step is to manage permissions on the tool. Permissions in Argo server rely on the principle that all identities (users, external services, etc…) must eventually be mapped to a service account.

The rights this service account has on Kubernetes objects (expressed as Kubernetes RBAC) represent the permissions that mapped identities have.

External services can call the Argo server events API to trigger workflows in response to external events. They authenticate to the Argo server by passing the token of the service account they are bound to as a Bearer token in the Authorization header.

Unfortunately, some services like GitHub don’t allow to customize headers when sending webhooks. For these particular services, the permissions system is different and relies on a secret that defines the mapping between secrets used to sign webhooks and service accounts.

Therefore, in GitHub’s case, we need to:


1. Create a new service account for the service and a secret holding its token.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: github-webhook
	namespace: workflows
	annotations:
    # Tells to the Argo Server which secret holds this service account's token
    workflows.argoproj.io/service-account-token.name: github-webhook
---
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: github-webhook
  annotations:
    kubernetes.io/service-account.name: github-webhook

2. Create the necessary Kubernetes RBAC resources granting permissions to the service on Argo Workflows resources. In our case, we only require our ServiceAccount to be able to submit workflows.

# minimum role to submit Workflows from WorkflowTemplates
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: submit-workflow-template
	namespace: workflows
rules:
	- apiGroups: ["argoproj.io"]
	  resources: ["workfloweventbindings"]
	  verbs: ["list"]
	- apiGroups: ["argoproj.io"]
	  resources: ["workflowtemplates"]
	  verbs: ["get"]
	- apiGroups: ["argoproj.io"]
	  resources: ["workflows"]
	  verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: github-webhook
	namespace: workflows
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: submit-workflow-template
subjects:
	- kind: ServiceAccount
	  name: github-webhook
	  namespace: workflows

3. Finally, we need to tell ArgoCD which secrets hold the credentials for our GitHub webhook, as well as the necessary information to check the webhook signature. We can do this by creating the following argo-workflows-webhook-clients secret

apiVersion: v1
kind: Secret
metadata:
	# ⚠️ The Argo Server will specifically look for this secret
	# Do not change its name!
  name: argo-workflows-webhook-clients
	namespace: workflows
stringData:
	# The key in the secret must match the name of the service account we want to bind to this webhook
	# The secret holds the credentials that are used for the signature of the webhook. The Argo Server also uses this signature to determine which secret key to look for.
	github-webhook: |
		type: github
		secret: "MyV3rYS3cr€t"

3 - Bind the GitHub webhook to a Workflow with a WorkflowEventBinding

The final step is to create the Workflow that will run our job and bind it to the GitHub webhook.

For that, we create a workflow WorkflowTemplate that will be reused by our WorkflowEventBinding for convenience. This template uses parameters that will be filled by the WorkflowEventBinding.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: hello-world
spec:
  entrypoint: whalesay
  ttlStrategy:
		# delete the workflow 1 week after it has been created if it failed
    secondsAfterFailure: 604800
		# delete the workflow 1 day after it has been created if it succeded
    secondsAfterSuccess: 86400
  templates:
    - name: whalesay
      inputs:
        parameters:
          - name: git_url
						# this value will be filled by the WorkflowEventBinding
            value: ""
          - name: git_ref
						# this value will be filled by the WorkflowEventBinding
            value: ""
          - name: head_commit
						# this value will be filled by the WorkflowEventBinding
            value: ""
      container:
        image: docker/whalesay
				command: [cowsay]
        args:
					# input parameters are injected into the args of the job pod
					- "hello world from /#"

Then, we can create our WorkflowEventBinding to trigger a job using the above template when an event is received:

apiVersion: argoproj.io/v1alpha1
kind: WorkflowEventBinding
metadata:
  name: github-push-webhook
spec:
  event:
    # This selector matches github push event webhooks targeting the "hello-world" event
    selector: discriminator == "hello-world" && metadata["x-github-event"] == ["push"]
  submit:
    workflowTemplateRef:
      name: hello-world
    arguments:
      parameters:
        - name: git_url
          valueFrom:
            event: payload.repository.clone_url
        - name: git_ref
          valueFrom:
            event: payload.ref
        - name: head_commit
          valueFrom:
						# expression can be used to retrieve information from the json payload that is sent
            event: 'len(payload.commits) > 0 ? payload.commits[len(payload.commits) - 1].id : "null"'

That’s it, we can try to push a commit to a repository covered by the webhook to test the whole machinery!


Argo Workflows VS GitHub Rulesets

GitHub recently added Rulesets to GitHub, that allow defining how people can interact with branches and tags in repositories. At the organization level, it can be used to enforce that specific workflows must pass before a pull request can be merged. However, Rulesets have several limitations compared to Argo Workflows.

Argo Workflows

Advantages

  • Work with any type of events emitted by GitHub
  • Jobs run in Kubernetes and thus can interact with other components or the Kubernetes API in an easy way
  • Really powerful to run compute-intensive jobs

Drawbacks

  • Need to manage a dedicated tool
  • Initial setup is complex, especially if you want your users to be able to log in on the Argo Server
  • The Argo Server UI is not as smooth as that of ArgoCD

GitHub Rulesets

Advantages

  • Integrated with GitHub directly, jobs are implemented as GitHub actions
  • Easy to setup

Drawbacks

  • Only work with pull_request, pull_request_target and merge_group events
  • Only available in paid plans for private repositories

Conclusion

Argo Workflows make it possible to go beyond the limits of GitHub Rulesets, to run workflows at the scale of an organization. However, it requires the management of a dedicated tool, and to write tasks with a different syntax to that of GitHub workflows.

It is especially interesting when you also want to use it for other use cases, for example, to enable users to run on-demand workflows with a user-friendly interface.