Docker Kubernetes Security: the img tool

6 April 2023

Docker is a must-have solution to standardize and easily deploy your applications. Considering the complexity of Docker, one may ask: is it safe to build container images in Kubernetes? Not really with Docker-in-docker, but a solution exists: the img tool.

Docker-in-Docker: the wrong security practice

By default, Docker requires very high privileges to work properly, whether for building images or running them in containers. Indeed, it requires to be run as root and this is explained by its need to perform critical actions on the system:

create and manipulate namespaces (run)
mount and manipulate filesystems (run + build)

In addition to this execution as root, the Docker commands use what is called the Docker socket (/var/run/docker.sock). This socket is actually the API that Docker uses for each of its actions. It is the interface between the CLI and the Docker deamon that runs in the background of the system.

Access to this socket allows us to communicate with the Docker daemon and thus to execute almost any action of the CLI: list the containers, launch a privileged container, ...

The elements of Docker operation that have just been detailed have implications on security when we want to run Docker in a container. For example in Kubernetes :

the container must run as root ;
the container must be privileged;
the container must have access to the Docker socket on the host system.

Each of these points is extremely critical to the security of a Kubernetes cluster since it implies the existence of a highly privileged container on the cluster. The compromise of this one container will directly imply the total compromise of the Kubernetes cluster.

In addition, this critical container executes commands provided to it by developers via the CI. A developer can easily manipulate the code in the CI to make the container execute the code of their choice and thus compromise the cluster.

The secure solution

In order to solve the problem stated above and build images in Kubernetes in a secure way, we propose a solution based on img and fuse-overlayfs.

Img is a CLI tool that allows us to perform the same commands as the Docker CLI but with some changes that will allow us to solve our security problem:

rootless
it doesn't require a privileged container
it doesn't need access to the host's Docker socket

The only flaw of img is that it requires access to the /dev/fuse device in order to take full advantage of fuse-overlayfs for its manipulation of the container image filesystem. Without fuse-overlayfs, img uses a lot of disk space (several hundred GB for a build!) because it has to recreate all the image filesystem at each step of the Dockerfile.

Setting up

Prerequisites

A working Kubernetes cluster
- Kyverno must be deployed in the cluster
Gitlab with a runner configured and deployed in Kubernetes

Create an image containing img and fuse-overlayfs

The first step is to create the docker image containing all the necessary tools to build our future images with img, automatically on the Gitlab runner in Kubernetes.

# Based on: https://github.com/genuinetools/img/blob/master/Dockerfile

# ----- img ------
FROM golang:1.13-alpine AS img

RUN apk add --no-cache \
	bash \
	build-base \
	gcc \
	git \
	libseccomp-dev \
	linux-headers \
	make

WORKDIR /img
RUN go get github.com/go-bindata/go-bindata/go-bindata
RUN git clone https://github.com/genuinetools/img \
  && cd img \
  && git checkout 16d3b6cad7e72f4cd9c8dad0e159902eeee00898 \
  && make static \
  && mv img /usr/bin/img

# ----- idmap ------
FROM alpine:3.11 AS idmap
RUN apk add --no-cache autoconf automake build-base byacc gettext gettext-dev gcc git libcap-dev libtool libxslt
RUN git clone https://github.com/shadow-maint/shadow.git /shadow
WORKDIR /shadow
RUN git checkout 59c2dabb264ef7b3137f5edb52c0b31d5af0cf76
RUN ./autogen.sh --disable-nls --disable-man --without-audit --without-selinux --without-acl --without-attr --without-tcb --without-nscd \
  && make \
  && cp src/newuidmap src/newgidmap /usr/bin

# ----- img and idmap -----
FROM alpine:3.11 AS base
RUN apk add --no-cache git pigz
COPY --from=img /usr/bin/img /usr/bin/img
COPY --from=idmap /usr/bin/newuidmap /usr/bin/newuidmap
COPY --from=idmap /usr/bin/newgidmap /usr/bin/newgidmap

RUN chmod u+s /usr/bin/newuidmap /usr/bin/newgidmap \
  && adduser -D -u 1000 user \
  && mkdir -p /run/user/1000 \
  && chown -R user /run/user/1000 /home/user \
  && echo user:100000:65536 | tee /etc/subuid | tee /etc/subgid

# ----- add fuse-overlayfs and tools -----
FROM base AS final
WORKDIR /build
RUN apk add git make gcc libc-dev musl-dev glib-static gettext eudev-dev \
	linux-headers automake autoconf cmake meson ninja clang go-md2man

RUN git clone https://github.com/libfuse/libfuse && \
    cd libfuse && \
    mkdir build && \
    cd build && \
    LDFLAGS="-lpthread -s -w -static" meson --prefix /usr -D default_library=static .. && \
    ninja && \
    ninja install

RUN git clone https://github.com/containers/fuse-overlayfs \
  && cd fuse-overlayfs \
  && git checkout v1.8.2
RUN cd fuse-overlayfs && \
    ./autogen.sh && \
    LIBS="-ldl" LDFLAGS="-s -w -static" ./configure --prefix /usr && \
    make clean && \
    make && \
    make install

RUN apk add --no-cache \
    bash \
    jq \
    py3-pip \
  && pip3 install --no-cache-dir awscli \
  && rm -rf /var/cache/apk/*

# ----- rootless -----
FROM final AS release
USER user
ENV USER user
ENV HOME /home/user
ENV XDG_RUNTIME_DIR=/run/user/1000
WORKDIR /home/user

DaemonSet for the fuse device

We then need to make the /dev/fuse device accessible by the Gitlab runner pods that will run img. To do this, we deploy a DamonSet in Kubernetes that will make the /dev/fuse of each node available as a resource. The fuse device will be automatically mounted on the pods containing a limit of type squat.ai/fuse: 1.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: generic-device-plugin
  namespace: gitlab
  labels:
    app.kubernetes.io/name: generic-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: generic-device-plugin
  template:
    metadata:
      labels:
        app.kubernetes.io/name: generic-device-plugin
    spec:
      priorityClassName: system-node-critical
      nodeSelector:
        kube/nodetype: gitlab
      containers:
      - image: squat/generic-device-plugin
        # count specifies that 15 pod are allowed to use the device simulteously
        args:
        - --device
        - '{"name": "fuse", "groups": [{"count": 15, "paths": [{"path": "/dev/fuse"}]}]}'
        name: generic-device-plugin
        ports:
        - containerPort: 8080
          name: http
        securityContext:
          privileged: true
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
        - name: dev
          mountPath: /dev
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: dev
        hostPath:
          path: /dev
  updateStrategy:
    type: RollingUpdate

Mutation policy Kyverno

The next two steps are a little trick to add the limits for the fuse device to the Gitlab Runner pods in our Kubernetes cluster. Indeed, the generic helm template of the Gitlab Runners does not allow to modify the limits of the pods.

So we will create a Kyverno policy to modify on the fly our pods and add the squat.ai/fuse: 1 limits for the fuse device to all pods with the mount-fuse: "true" label.

apiVersion: kyverno.io/v1
kind: Policy
metadata:
  name: add-fuse-device
  namespace: gitlab
spec:
  rules:
  - name: add-fuse-device
    match:
      any:
      - resources:
          kinds:
          - Pod
          selector:
            matchLabels:
              mount-fuse: "true"
    mutate:
      patchesJson6902: |-
        - op: add
          path: "/spec/containers/0/resources/limits"
          value: {"squat.ai/fuse":"1"}

Gitlab Runners configuration

Then you just have to modify the Gitlab Runners configuration to add the label mount-fuse: "true" to the pods in Kubernetes.

runners:
    config: |
      [[runners]]
        [runners.kubernetes]
					[runners.kubernetes.pod_labels]
						mount-fuse = "true"

Usage

To build our images in a secure way, we now only need to replace docker with img in our CI, with a few exceptions:

The network parameter does not exist for img, but is active by default
For build-args it is mandatory to specify the name and value of the variable:
- build-arg "MYVAR=$MYVAR"
img push does not push several tags of the same image at the same time (unlike docker push), you need an img push for each tag.
docker rmi becomes img rm.

Example of a CI job

.release-java:
  stage: release
  image: my-repo/img-aws:1.0.0
  before_script:
    - cp -r configuration $WORKDIR
    - cd $WORKDIR
    - aws ecr get-login-password --region eu-west-3 | img login --username AWS --password-stdin ${DOCKER_URL}
    - if [[ ! -z $CI_COMMIT_TAG ]]; then export DOCKER_TAG=$(echo
      $CI_COMMIT_REF_NAME | tr @ _); fi;
  script:
    - img build
      --cache-from ${DOCKER_URL}/${DOCKER_REPO}/${DOCKER_IMAGE}:${TARGET_ENV}-latest
      -t ${DOCKER_URL}/${DOCKER_REPO}/${DOCKER_IMAGE}:${DOCKER_TAG}
      -t ${DOCKER_URL}/${DOCKER_REPO}/${DOCKER_IMAGE}:${TARGET_ENV}-latest
      -f docker/ci.Dockerfile
      --build-arg "MY_VAR=$MYVAR"
    - img push ${DOCKER_URL}/${DOCKER_REPO}/${DOCKER_IMAGE}:${DOCKER_TAG}
    - img push ${DOCKER_URL}/${DOCKER_REPO}/${DOCKER_IMAGE}:${TARGET_ENV}-latest
    - img rm ${DOCKER_URL}/${DOCKER_REPO}/${DOCKER_IMAGE}:${DOCKER_TAG}
    - img rm ${DOCKER_URL}/${DOCKER_REPO}/${DOCKER_IMAGE}:${TARGET_ENV}-latest
  variables:
    DOCKER_TAG: ${CI_COMMIT_SHA}

Conclusion

We have just seen why the docker-in-docker method is not secure for image building in Kubernetes. We then explored a solution based on img but which requires quite a few actions on Kubernetes to get good performance.

Another possible secure solution that we have not detailed here is to use Kaniko. We did not choose this solution because, in our opinion, it is less flexible than img.

Indeed, the kaniko image does not support the addition of other steps than the image build. However, it is often interesting to perform actions after the build such as a scan of the image for example.