Container Security

Security Foundations

Recap: what Docker is really securing
  • Security through isolation mechanisms
    • Namespaces decide what a process can see.
    • Cgroups decide what a process can consume.
    • Chroot / pivot_root / mount isolation decide what filesystem tree a process can reach.
  • A secure container deployment also needs:
    • safe image construction,
    • safe runtime configuration,
    • safe networking,
    • safe storage choices,
    • safe secret handling,
    • and a patched host kernel.
A useful mental model
  • Container security is best understood in layers:
    1. Host security
    2. Image / supply-chain security
    3. Runtime hardening
    4. Network and storage controls
    5. Orchestration and secret management
  • If the host is weak, containers inherit that weakness.
  • If the image is weak, runtime hardening is limited by the inherent weakness.

Threat Model and Security Layers

Containers versus virtual machines
  • Virtual machines package:
    • a guest kernel,
    • guest user space,
    • and hardware virtualization through a hypervisor.
  • Containers package:
    • application code and user space,
    • but they share the host kernel.
  • Consequence:
    • containers are lighter and faster,
    • but kernel compromise is more dangerous because the host kernel is shared.
Linux primitives that support container security
  • Namespaces
    • isolate process trees, mount tables, hostnames, IPC, networks, and user IDs.
  • cgroups
    • limit CPU, memory, and I/O to reduce denial-of-service impact.
  • seccomp
    • filters system calls and reduces kernel attack surface.
  • AppArmor / SELinux
    • apply mandatory access-control policies beyond ordinary UNIX permissions.
  • Capabilities
    • split root privilege into smaller units so that containers can run with fewer privileges.
The big misconception: isolation is not immunity
  • A container can be isolated and still be insecure.
  • Common reasons:
    • the process runs as root,
    • the container is privileged,
    • broad Linux capabilities are left enabled,
    • dangerous host paths are bind-mounted,
    • unnecessary ports are published,
    • secrets are baked into the image,
    • the host or runtime is out of date.
Rootless mode vs. container non-root users
  • Rootless Docker (recommended)
    • The Docker daemon and its containers run as a non-root user on the host.
  • Default Docker
    • The daemon usually runs as root, even if the app inside the container does not.
  • User namespace remapping: Lets a container appear to run as root internally while mapping that root user to an unprivileged UID on the host.
  • USER in Dockerfile
    • Controls which user the containerized application runs as.
Hands-on: verify Rootless Docker
1
2
3
4
sudo apt-get update
sudo apt-get install -y uidmap docker-ce-rootless-extras
grep "^$USER:" /etc/subuid
grep "^$USER:" /etc/subgid
  • Make sure that Docker is not running
1
2
sudo systemctl disable --now docker.service docker.socket
sudo rm -f /var/run/docker.sock
  • Install rootless Docker for the current user:
1
2
3
4
dockerd-rootless-setuptool.sh install
systemctl --user start docker
sudo loginctl enable-linger $USER
export DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock
  • Verify that Docker is running in rootless mode:
1
2
3
docker info
docker context ls
docker run --rm hello-world
  • Observations:
    • docker info should show security options including rootless
    • the active Docker context is often rootless
    • the socket should be under /run/user//docker.sock

Build and Supply-Chain Security

Minimal base images and multistage builds
  • Preference
    • minimal base images,
    • only the runtime dependencies you actually need,
    • multi-stage builds so compilers, package managers, and test tools do not appear in the final image.
  • Smaller images usually reduce:
    • package count,
    • attack surface,
    • patching burden,
    • and scan noise.
Pin what you build from
  • Tags are mutable.
    • latest is especially slippery.
  • For reproducibility and supply-chain control:
    • pin important base images to versioned tags,
    • and for stronger guarantees, pin by digest.
  • This matters even more in multi-node systems, where different replicas must not silently drift to different image contents.
Do not leak secrets into the build
  • Avoid putting secrets into:
    • COPYed files,
    • ENV,
    • or ARG.
  • Build arguments are not a safe secret mechanism.
  • Use BuildKit secret mounts during builds instead.
  • Remember:
    • if a secret enters an image layer, students can often recover it later with image history, layer inspection, or accidental push to a registry.
Use .dockerignore aggressively
  • Everything in the build context is a candidate for accidental disclosure.
  • A careless build context can send:
    • .git,
    • test fixtures,
    • SSH keys,
    • local .env files,
    • notebook outputs,
    • and grading artifacts to the builder.
  • Use .dockerignore the same way you use __pycache__ cleanup in a tidy repo:
    • not glamorous,
    • but it prevents dumb leaks.
Scan images and attach metadata
  • Image scanning should be part of CI, not a special ritual performed only after something catches fire.
  • Modern best practice is to produce and store:
    • SBOMs (Software Bills of Materials),
    • and provenance attestations.
  • These support:
    • vulnerability scanning,
    • software inventory,
    • reproducibility,
    • and policy enforcement.
  • In other words: not just “Is this image bad?” but also “What exactly is inside it, and where did it come from?”
Image trust
  • Older Docker guidance often recommended Docker Content Trust (DCT)
    • let publishers sign image tags,
    • le clients with DOCKER_CONTENT_TRUST=1 to only pull/run/build with signed tagged images.
    • This is being retired.
  • DCT was mainly about “is this tag signed?”
  • Modern container trust asks a broader set of questions:
    • Who signed this image?
    • What claims are attached to it?
    • How was it built, and from what source?
  • A better modern vocabulary is:
    • Signature
      • proves that a trusted identity signed a specific artifact.
      • Example tools: Sigstore Cosign and Notation.
    • Attestation
      • signed metadata attached to an image, such as an SBOM, vulnerability/exemption data, or custom policy claims.
      • Docker BuildKit generates attestations and stores them with the image; Sigstore Cosign can also sign and verify in-toto attestations.
    • Provenance
      • a specific kind of attestation describing how the image was built:
        • build timestamps,
        • build parameters,
        • source repository and revision,
        • build materials,
        • platform,
        • reproducibility metadata.
Hands-on: Another Registry Besides Docker Hub

Fork from the following repository

Runtime, Storage, Network, and Secrets

Run as a non-root user
  • In Dockerfile:
    • create and use an application user with USER.
    • see example in the Dockerfile for our fabric-examples repo.
  • At runtime:
    • assume compromise will happen eventually,
      • What can the attacker receives?
      • Not a good answer: root inside the container plus writable host paths.
Least privilege at runtime
  • Good hardening defaults for many application containers:
    • --cap-drop=ALL
    • add back only the few capabilities that are truly needed,
    • --security-opt no-new-privileges=true
    • --read-only
    • explicit writable volumes or tmpfs for only the required paths.
  • Why?
    • fewer capabilities,
    • fewer privilege transitions,
    • smaller filesystem mutation surface.
Dangerous mounts
  • High-risk examples:
    • mounting / from the host,
    • mounting /var/run/docker.sock,
    • broad writable bind mounts into sensitive host directories.
  • The Docker socket is especially dangerous:
    • if a container can talk to the daemon, it can often control other containers and effectively control the host.
  • Treat writable host mounts as attack paths.
Security in storage choices
  • Volumes are better for persistent application data.
  • Bind mounts are more appropriate for development or deliberate host integration.
  • tmpfs is useful for temporary sensitive data.
  • Security interpretation:
    • prefer volumes for databases and persistent service data,
    • use bind mounts sparingly and preferably read-only,
    • use tmpfs for secrets or scratch data that should not persist to disk.
Port publishing and network exposure
  • Publishing ports is not neutral.
  • Published ports are reachable from outside by default unless you bind them to localhost.
  • Good habits:
    • publish only what you need,
    • bind admin/debug ports to 127.0.0.1 when possible,
    • do not use host networking unless you truly need it,
    • segment services on dedicated bridge or overlay networks.
Secrets: files over environment variables
  • Environment variables are convenient, but they are also easy to leak:
    • logs,
    • accidental dumps,
    • debugging output,
    • screenshots,
    • student copy-paste.
  • Prefer secret files mounted at runtime.
  • In Compose and Swarm, secrets are mounted under /run/secrets/....
  • This is cleaner and safer than putting passwords into .env files.

Compose Security

Compose as policy
  • Security benefit:
    • Compose lets you encode runtime policy into version-controlled files.
  • Instead of relying on memory:
    • “Did I remember --read-only this time?”
  • you can declare:
    • read_only: true
    • cap_drop:
    • security_opt:
    • tmpfs:
    • secrets:
    • user:
    • ports: in the service definition.

Best Practices Summary

Secure-by-default checklist
  • Keep the host kernel, Docker Engine, and runtime up to date.
  • Prefer rootless mode when feasible.
  • Otherwise use user namespace remapping when applications insist on root.
  • Use minimal base images and multi-stage builds.
  • Pin important images to versions and preferably digests.
  • Run containers as non-root.
  • Drop unnecessary capabilities.
  • Use no-new-privileges.
  • Use a read-only root filesystem whenever possible.
  • Prefer volumes over broad writable bind mounts.
  • Use read-only bind mounts for config files.
  • Use tmpfs for sensitive temporary data.
  • Publish only required ports, and bind to localhost when external access is not needed.
  • Avoid --privileged, host networking, and dangerous host mounts.
  • Keep secrets out of images, out of build args, and preferably out of environment variables.
  • Use Compose / Swarm secrets and BuildKit secret mounts.
  • Add image scanning, SBOMs, and provenance attestations to CI.
  • Treat image trust as a signing / provenance problem, not a “just enable DCT” checkbox.