Advanced Linux: The Building Blocks of Containers

Advanced Linux: The Building Blocks of Containers


1. Beyond the Hypervisor

Recap: The Virtualization Cost
  • In the previous lecture, we discussed Hypervisors (Type 1 & 2) and VMs.
  • VMs require a full guest OS kernel, which adds overhead (CPU rings, memory mapping).
  • Question: Can we isolate processes without simulating hardware?
  • Answer: Yes, by using features built directly into the Linux Kernel.
The Three Pillars of Containerization
  • Namespaces: What a process can see (Isolation).
  • Cgroups (Control Groups): What a process can use (Resource Limiting).
  • Chroot/Pivot_root: Where the process thinks root (/) is (Filesystem Isolation).
  • Docker is essentially a fancy wrapper around these three Linux primitives.

2. Setup and Preparation

Environment Setup
  • Platform: CloudLab.
  • Launch from the main branch of your CloudLab class profile.
  • Goal: We will create a “container” without using Docker, using only Linux kernel primitives.
Install Dependencies
  • We need tools to manage control groups and simulate load.
  • SSH into your CloudLab node.
  • Run the following:
1
2
sudo apt update
sudo apt install -y cgroup-tools stress debootstrap

3. Namespaces (Isolation)

Overview
  • Namespaces wrap a global system resource in an abstraction.
  • To the processes within the namespace, it appears they have their own isolated instance of the global resource.
  • Common Namespaces:
    • PID: Process IDs (Process 1 inside container vs Process 1234 on host).
    • MNT: Mount points and filesystems.
    • NET: Network interfaces, stacks, ports.
    • UTS: Hostname and domain name.
Hands-on: The PID Namespace
  • Goal: Create a shell that thinks it is PID 1 (like a container).
  • Open your CloudLab terminal.
  • Check your current PID:
1
echo $$
  • Use unshare to create a new PID namespace and fork a bash process:
1
sudo unshare --fork --pid --mount-proc bash
  • Check the PID inside this new environment:
1
2
echo $$
ps aux
  • Observation: You should see only a few processes, and your bash shell should be PID 1.
  • Type exit once to return to the host and test pid again.
1
2
exit
echo $$

4. Cgroups (Resource Administration)

Overview
  • While Namespaces hide resources, Control Groups (cgroups) limit them.
  • Originally developed by Google (started as “Process Containers”).
  • Organized in a hierarchy (tree structure) located at /sys/fs/cgroup.
  • Controls: Memory limits, CPU quotas, I/O throttling.
Hands-on: Manually Creating a Cgroup
  • Goal: Create a “jail” that limits a process to 100MB of RAM.
  • Create a new cgroup called mygroup:
1
sudo cgcreate -g memory:mygroup
  • Set the limit to 100MB (in bytes):
1
2
# 100MB = 104857600 bytes
echo 104857600 | sudo tee /sys/fs/cgroup/mygroup/memory.max
Hands-on: Testing the Limit
  • Run a stress test inside that cgroup that tries to eat 99MB of RAM.
    • You will need to use Ctrl-C to terminate the running process
1
sudo cgexec -g memory:mygroup stress --vm 1 --vm-bytes 99M --vm-keep
  • Run a stress test inside that cgroup that tries to eat 101MB of RAM.
1
sudo cgexec -g memory:mygroup stress --vm 1 --vm-bytes 101M --vm-keep
  • Observation: The process should fail or be killed by the OOM (Out of Memory) Killer immediately.
  • Compare this to running it without the cgroup (which would succeed).

5. The Filesystem (Chroot)

Overview
  • How do containers have different files than the host (e.g., Ubuntu container on CentOS host)?
  • They change the “root” directory via chroot.
  • Docker uses advanced “Copy-on-Write” filesystems (OverlayFS), but chroot is the ancestor concept.
Hands-on: Building a ‘Container’ from Scratch
  • Goal: Create a mini-filesystem and lock a process inside it.
  • Create a folder for our new root:
1
mkdir container
  • Use debootstrap to setup a base file system inside:
1
sudo debootstrap --variant=minbase stable /users/$USER/container http://deb.debian.org/debian
  • Mount essential virtual filesystems
1
for dir in dev proc sys; do sudo mount --bind /$dir /users/$USER/container/$dir; done
  • Enter the isolated file system jail:
1
sudo unshare --mount --uts --ipc --pid --fork chroot /users/$USER/container /bin/bash
  • Try to look at /home or /users. They don’t exist here! You are isolated.
  • Exit out of the container when done
1
exit

6. Process Isolation (Namespaces)

Concept: PID Namespace
  • Even inside chroot, if we mounted /proc, we would see all the host’s processes.
  • We need a PID Namespace to hide the host processes.
Step 1: Unshare and isolate
  • We will use unshare to create a new namespace, then immediately chroot into our folder.
1
sudo unshare --mount --uts --ipc --pid --fork chroot container /bin/bash
  • Next, we install ps and setup a separate mount point for /proc from inside the container
1
2
apt update
apt install -y procps
Step 2: Verify Isolation
  • Run ps aux inside the container.
  • Observation: You should see your bash process as PID 1.
  • On the host, this process might be PID 12345, but inside the namespace, it is PID 1.
  • This is exactly how Docker containers perceive themselves as the only thing running on the machine.
  • Type exit to return to the host.

7. Looking Ahead: Why Docker?

The Administration Nightmare
  • Imagine doing the steps above for every single application you deploy.
  • Manual cgroup math, manual dependency copying, manual network bridging.
  • Docker is a daemon that automates:
    • Creating Namespaces.
    • Configuring Cgroups.
    • Managing Filesystems.
  • Next lecture, we will see how docker run replaces all these manual commands.