Falco Kubernetes: the threat detection engine

27 June 2023

You have a Kubernetes cluster managed by your cloud provider or on-premise, and you want to improve its security? One of the components of a defense-in-depth approach is a threat detection engine. In this article, we are going to inspect Falco, a cloud-native runtime security project.

A Kubernetes threat detection engine

Falco is a “cloud-native runtime security” open-source tool that aims to be THE threat detection engine on Kubernetes.

In other words, Falco detects threats at runtime by observing and monitoring the behavior of different components of your Kubernetes cluster: Nodes, Pods, Applications, Kubernetes API, … To do this Falco uses information from linux system calls and Kubernetes Audit Logs.

Examples of interesting unusual behaviors detectable by Falco are:

Privilege escalation using privileged containers
Container escape attempt changing linux namespaces
Mutating Configmap with private credentials
Mutating resources in kube-system namespace
An untrusted node trying to join the cluster
Read/Writes to well-known directories such as /etc, /usr/bin, /usr/sbin, etc
Ownership and Linux Mode changes
Unexpected network connections or socket mutations
Spawned processes using execve or executing shell binaries
Mutating Linux coreutils executables
Mutating login binaries

Here is a classic attack scenario that can be detected by Falco:

The attacker exploits an RCE (Remote Code Execution) on an application pod
He spawns a remote shell ⇒ detected because of unexpected shell execution and network connection
He explores the container’s filesystem for critical information ⇒ detected because of reading files in /etc and/or /usr
Kubernetes credentials discovery
Finally, he create a privileged container and escape from it to be root on the Node ⇒ detected because of privileged container creation and linux namespace changes

The main workflow of Falco engine is:

Getting event stream by :
- Parsing the Linux system calls from the kernel at runtime for each Node
- Parsing Kubernetes Audit Logs and Metadata
Asserting the stream against a powerful rules engine
Alerting when a rule is violated

To better understand this here is a schema about how Falco works.

falco_schema

Falco Driver (Kernel Module or eBPF probe)

Falco driver is a software that will be installed on each Node of the Kubernetes Cluster. Its role is to analyze the system workload and pass security events to userspace in as a system call information stream.

Currently, Falco supports the following drivers (also called syscall sources):

Kernel module built on libscap and libsinsp C++ libraries (default)
eBPF probe built from the same modules
Userspace instrumentation
modern eBPF probe (experimental)

Kubernetes Audit Events

In order to retrieve more information and context about the Kubernetes cluster, Falco uses Kubernetes Audit Logs and Metadata as another events source. Because almost all the cluster management tasks are performed through the API server, the audit log can effectively track the changes made to the cluster.

To enable Kubernetes Audit Events the associated plugin must be installed. The installation process may depend on the cloud provider in the case of a managed Kubernetes cluster.

Falco Rules

Falco Rules are items that Falco asserts against. They represent all the events you want to check on the cluster and which you want to log or trigger an alert.

A Falco rules file is a YAML file containing mainly three types of elements:

Rules: Conditions under which an alert should be generated. A rule is accompanied by a descriptive output string that is sent with the alert.
Macros: Rule condition snippets that can be re-used inside rules and even other macros. Macros provide a way to name common patterns and factor out redundancies in rules.
Lists: Collections of items that can be included in rules, macros, or other lists. Unlike rules and macros, lists cannot be parsed as filtering expressions.

Falco comes with a set of pre-defined rules but most of the time you will need to customize and adapt them to fit your project and limits false positive (which may be many by experience).

Here is an example of a rule:

- rule: shell_in_container
  desc: notice shell activity within a container
  condition: >
    evt.type = execve and 
    evt.dir = < and 
    container.id != host and 
    (proc.name = bash or
     proc.name = ksh)    
  output: >
    shell in a container
    (user=%user.name container_id=%container.id container_name=%container.name 
    shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)    
  priority: WARNING

rule: the name of the rule
desc: a short description explaining what this rule checks for
condition: conditions under which the rule should generate an alert
- if the event type is execve, inside a container (not host), and the process name is bash or ksh
- in other words, if bash or ksh are launched inside a container
output: more details description of the rules with context, it will be passed to the generated alert. It is very important to give as much context as possible to facilitate a future investigation.
priority: The level of priority of the rule. Useful filter rules, for instance only logs low-priority rules and triggers an alert on high-priority ones.

Filter engine

This is the core component of Falco. It retrieves and analyzes the stream of events from the driver (syscalls) and Kubernetes Audit Logs and searches for a match between these events and a rule from Falco Rules. Matching events are sent with context information to the alerting component.

Alerting

Alerts are configurable downstream actions that can be as simple as logging to STDOUT or as complex as delivering a gRPC call to a client. Falco can send alerts to :

Standard Output
A file
Syslog
A spawned program
An HTTP[s] endpoint
A client through the gRPC API

In order to have more possibility of channels for alerting, it is possible to use Falco sidekick. Falco sidekick is a side component of Falco, it is a gRPC client that can retrieve Falco alerts and transfer them to other alerting channels (Slack, Discord, AlertManager, Buckets, AWS SQS, GCP PubSub, …).

Kubernetes threat detection comparison

The main alternative of Falco for threat detection in Kubernetes clusters is cloud providers' security services: AWS GuardDuty or Google Security Command Center. These services make it possible to monitor and improve the security of cloud resources on the corresponding cloud provider, therefore including EKS (AWS) and GKE (GCP) clusters with threat detection.

They are really simple to deploy and integrate, in a few clicks it’s done. However, they have a certain cost and obviously do not manage the case of on-premise Kubernetes clusters.

Falco’s strengths:

Opensource and Free
Based on low-level events: Syscall and Kubernetes Audit Logs
Customizable thanks to Falco Rules syntax and Falco sidekick for alerting
Can provide much specific information about the context of an alert to make easier the investigation (node, namespace, container, linux user, linux process, …)
Cloud agnostic
Can be integrated into an on-premise Kubernetes Cluster

Falco’s weaknesses :

Default rule sets need to be tweaked to prevent false positive
Need to be deployed inside the target Kubernetes Cluster, in the event of a complete cluster compromise, Falco could be disabled by the attacker.

Deploying Falco

Now, it’s time for tech! We will see how to deploy and configure Falco for a simple setup.

Deployment

Deploy Falco as Helm Chart

The easier way to deploy Falco on your Kubernetes cluster is to use the official Falco Helm Chart.

Here is a simple value.yaml file to deploy Faclo with:

kernel module driver
all default rule sets
k8saudit plugin
Falco sidekick configured to send alerts to a Slack channel

# values.yaml

driver:
  enabled: true
  kind: module

falco:
  rulesFile:
    - /etc/falco/falco_rules.yaml
    - /etc/falco/falco_rules.local.yaml
    - /etc/falco/k8s_audit_rules.yaml
    - /etc/falco/rules.d
  load_plugins: [k8saudit, json]
  jsonOutput: true
  priority: warning

  # Publish to falco sidekick
  httpOutput:
    enabled: true
    url: "http://falcosidekick:2801/"

falcosidekick:  
  replicaCount: 2
  config:
    slack:
      webhookurl: "<slack_webhook>"
      footer: ""
      icon: ""
      username: "Falco"
      outputformat: "all"
      minimumpriority: "critical"
      messageformat: "[<environment>]"

To install the chart with the release name falco in namespace falco run:

helm install falco falcosecurity/falco --namespace falco --create-namespace

The default configuration in values.yaml of our helm chart deploys Falco using a daemonset. You should have one Falco pod in each node.

I would give the deployment process a mark of 5/5 because the deployment process in Kubernetes is really simple.

Configuration

To have a complete working threat detection system, we still have to configure two main points: alerting and rules.

Create Slack Webhook for Falco sidekick alerting

Falco sidekick retrieves alert events from Falco and as we configured previously, tries to send them to a slack channel. We now need to create this channel :

Create an Incoming Webhook for Slack
- Create a new Slack app falco for your workspace
- Create a channel alert-falco on slack
  - ⚠️ Create a PRIVATE channel since the alert could contain sensitive information about your infrastructure.
- Enable incoming webhooks for the app and add a webhook for the alert-falco channel
- Copy the webhook URL and complete the values.yaml file (falcosidekick.config.slack)

Customize Falco Rules

From my experience, Falco's default rules generate a lot of false positives and are not always relevant depending on the context. That’s why I strongly recommend you take the time to customize Falco rules.

To do it, define a set of relevant rules for your project by drawing from the predefined rules or by creating new ones. Then to include these new rules when deploying Falco, you should add this line during the helm install/upgrade: -set-file 'customRules.my_rules\\.yaml'=<path/to/my_rules.yaml>

There would be false positives at the beginning, be attentive to the alerts, and in the event of a false positive edit the rules accordingly.

I would give the configuration process a mark of 4/5 because Falco is very customizable thanks to Falco sidekick for alerting and Falco Rules system. However, it is hard and takes time to define good rules and prevent false positives.

Operator experience

From an operator's point of view, Falco should be easy to maintain once rules are defined and false positives cleaned.

The major difficulty is alerts management, you would want to define a process to handle alerts: for each alert, investigate and determine if it is a false positive or a real security issue and take action depending on the case.

Another difficulty I have seen in my projects occurred in the case of a node kernel update. In fact, it may happen that you want to update your nodes but there is currently no Falco driver available for the new node kernel version. In that case, you would need to request a such build or build your own Falco driver.

This situation is really annoying because it puts us in front of a dilemma:

Delay the node update to wait for a driver version compatible with the target kernel. However, quick node updates may be critical for security reasons
Disable Falco for a while to wait for a compatible driver version. This implies an obvious security risk

I would therefore give operator experience a 2.5/5 mark.

User experience

For discovery blog posts at Padok, we like to reflect on user experience when using a particular technology. However, for Falco, it’s not relevant as it’s not a tool destined to be used by end-users.

The “user” of Falco could be the receiver of Falco alerts (often an operator), in this case, alerts are fully customized as needed.

Conclusion

In a nutshell, Falco is a great solution for runtime Kubnertes threat detection in a managed or on-premise cluster. However, be aware that tweaking the rules can be difficult and time-consuming at first and there may be a lot of false positives at first.

Topic | Rating

Deployment: 5/5
Configuration: 4/5
Operator experience: 2.5/5
User experience: N/A
Final: 4/5