Containers are chroot with a Advertising and marketing Finances
There are numerous methods to grasp how containers work, however most helpful explanations are literally simplifications.
Many individuals have settled on explaining containers by calling them ‘lightweight VMs’ and they’re lightweight as a result of they ‘share the kernel with the host’. That is helpful, but it surely simplifies rather a lot away. What’s a ‘lightweight VM’? What does sharing the kernel imply?
Others will let you know containers are about namespaces and particular kernel visibility tweaks. That is additionally a useful clarification as a result of namespaces partition visibility, in order that operating containers can’t see different issues on the identical machine.
However for me, containers are simply chrooted processes. Positive, they’re greater than that: Containers have a pleasant developer expertise, an open-source basis, and an entire ecosystem of cloud-native firms pushing them ahead. However, let me present you why I believe chroot
is the important thing.
So, let’s construct a container runtime utilizing solely the chroot system name. Doing so, we are able to study a little bit about chroot
, a little bit about container runtimes, and it’ll even be enjoyable!
The Objective
By the tip, I’ll have one thing that appears like docker run, known as chrun
, the place you may pull docker photographs:
> chrun pull redis
Pulling image redis
export image 16b87aa63c8f3a1e14a50feb94cba39eaa5d19bec64d90ff76c3ded058ad09c8
And then run them:
Running /usr/local/bin/redis-server in /tmp/_assets_redis_tar_gz4234401501
4360:C 31 Oct 2022 16:07:57.253 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
4360:C 31 Oct 2022 16:07:57.253 # Redis version=7.0.5, bits=64,
4360:C 31 Oct 2022 16:07:57.253 # Warning: no config file specified, using the
4360:M 31 Oct 2022 16:07:57.256 * Increased maximum number of open files to
4360:M 31 Oct 2022 16:07:57.256 * monotonic clock: POSIX clock_gettime
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 7.0.5 (00000000/0) 64 bit
.-`` .-` `. ` `/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 4360
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | https://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
4360:M 31 Oct 2022 16:07:57.260 # Server initialized
4360:M 31 Oct 2022 16:07:57.265 * Ready to accept connections
And it will do this using chroot. But first, some background.
History of chroot
Observatory Unix Source
chroot
probably doesn’t get a lot of mention now that containers exist, but it’s a Unix system call. This means it’s a way to request something from the operating system kernel. It is also a utility program, so it’s easy to call from the shell.
All it does is change the root directory (/
) to a new value. That’s all chrooting does. It just changes what /
means. That sounds simple, but file paths are at the heart of how Unix works, so you can do a lot with this call.
Chroot is a much older system call than the ones modern container runtimes use, which means, in theory, the chrun
shown above could run on a much older linux kernel. But how far back into Linux history could we go?
Actually, we can go back to way before the creation of Linux. chroot first appeared in 1979 for Unix v7.
(I know this because Diomidis Spinellis put together this excellent github repository that recreates the historical past of Unix from the earliest out there supply to at present’s trendy variations. The historical past recreated on this repo stretches again to 1970 and consists of the unique PDP-7 meeting code of the primary iteration of Unix.)
It got here together with chdir ( the system name equal of cd
) and regarded like this:
struct user
{
...
struct inode *u_cdir; /* pointer to inode of current directory */
struct inode *u_rdir; /* root directory of current process */
...
}
&u
is a reference to the current users struct, which holds u_rdir
and u_cdir
.So, a person on a Unix system has a present listing and root listing and chroot is a strategy to change the basis worth (u_rdir
) in the identical means cd
adjustments the present working listing (u_cdir
). In Unix V7 that’s mainly all of the chroot
code I see, aside from the syscall checklist and a few userland code so to name chroot
out of your shell:
/ C library -- chroot
/ error = chroot(string);
.globl _chroot
.globl cerror
.chroot = 61.
_chroot:
mov r5,-(sp)
mov sp,r5
mov 4(r5),0f
sys 0; 9f
bec 1f
jmp cerror
1:
clr r0
mov (sp)+,r5
rts pc
.data
9:
sys .chroot; 0:..
So chroot goes means again, again into the 70s, and whereas the implementation has in all probability modified over time, semantically it nonetheless matches the outline discovered within the UNIX V7 Handbook:
Chroot units the basis listing, the start line for path names starting with
/
. The decision is restricted to the super-user.
Okay, historical past lesson over. Let’s begin constructing issues.
Utilizing chroot
Immediately
Let’s begin with the command-line and work in direction of our docker run clone.
Essentially the most easy docker run is hello-world:
> docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
...
To recreate run this hello-world in chroot jail
is relatively straightforward.
chroot
Hello World
At the command line, I can setup the hello-world in a changed root like so:
Then run it:
> chroot /testroot /hello
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
The Root of the Matter
chroot
only works as a root user, so assume from here on out everything is being done as root on a Linux machine.
If you try as a non-root user, you will get something like this:
We can also do this from go, making the system call directly:
package main
import (
"os"
"os/exec"
"syscall"
)
func main() {
cmd := exec.Command("/hello")
syscall.Chroot("/testroot")
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.Run()
}
And the output is the same:
> go run go-change-root.go
Hello from Docker!
This message shows that your installation appears to be working correctly.
...
That hello process runs with a filesystem rooted to /testroot
. So there is nothing in the filesystem it can see besides itself.
I could verify that by running a shell inside it and poking around. However, when you change the root, the command passed is relative to the new root, so running /bin/sh
will fail.
I could cp
/bin/sh
into /testroot
, but sh
dynamically links in libc and probably other stuff. Those file pointers won’t point to anything in our new root so it won’t work. This is also why you can’t shell into the hello-world image with docker run
:
> docker run -it hello-world /bin/sh
exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown.
Predictably, you can only shell into an image with a shell (and supporting userspace dependencies) inside it. So I’m going to be using redis:latest
today:
> docker run -it redis /bin/sh
> cd /
> ls
bin boot data dev etc home lib lib64 media mnt opt proc root run
sbin srv sys tmp usr var
Now let’s try to chroot into this redis image. But to do that, I first need to get the file-system out of the image so I can pass it to chroot
.
To extract file-system from the image, the first thing I’ll try is to grab the image and extract it:
I can then look inside it:
redis
├── 131d224a301217b1d881f2464837d310dc8e0bf701d049fc30fb9eabddd98cbc
│ ├── VERSION
│ ├── json
│ └── layer.tar
├── 2279e9cb00a8a268cb01a1ccd1b7c0a01dc6b9ec619a7877dda2ca81e7409428
│ ├── VERSION
│ ├── json
│ └── layer.tar
├── 2d0405b8f23157bc9f45cadc12b8b7ff23446dfe968bfa7473cb78ec2444d198
│ ├── VERSION
│ ├── json
│ └── layer.tar
├── 770413d3495f9ba555e345d5c5397580a61cc64d9a945135b4b2235eed19d07b
│ ├── VERSION
│ ├── json
│ └── layer.tar
├── beb3916dbc72988060eaa0ba9ba119c76eb1c07db1c18ea53d3ca4f40a03c436
│ ├── VERSION
│ ├── json
│ └── layer.tar
├── c2342258f8ca7ab5af86e82df6e9ade908a949216679667b0f39b59bcd38c4e9.json
├── f7b46deebf614151dce2888bcb81e312da2ac791230b02688a5dbab1dee7ea91
│ ├── VERSION
│ ├── json
│ └── layer.tar
├── manifest.json
└── repositories
I’m on the right track, but this is not exactly what I wanted. Each layer.tar
is the union file-system changes for that image layer. To build the completed file structure I would need to extract each of these and combine them in the right order.
Thankfully, I can just ask docker to do that for us with docker export
.
> docker export $(docker create redis) -o redis.tar.gz
> mkdir redis && cd redis
> tar --no-same-owner --no-same-permissions --owner=0 --group=0
-mxf ../redis.tar.gz
Then I end up with the extracted redis file structure:
./redis
├── bin
├── boot
├── data
├── dev
├── etc
├── home
├── lib
├── lib64
├── media
├── mnt
├── opt
├── proc
├── root
├── run
├── sbin
├── sys
├── tmp
├── usr
└── var
And so if I wrap that docker export
up into a bash script, I can grab the file system for any image on docker hub, turning any Linux container image into a tar file.
./pull "redis"
Pulling image redis
export image c20f5ecac2f9c49521b32433ffc6abeade950e77592805b0fc61fea00d6e32f5
From there, my trusty rusty chroot
command works very similar to my docker run -it redis /bin/sh
from a few steps in the past:
> chroot ./redis /bin/sh
> ls
bin boot data dev etc home lib lib64 media mnt opt proc root run
sbin srv sys tmp usr var
It’s Just a Process
Here is why this is interesting from a learning perspective:
When I run docker run ..
something happens – an image is turned into a container and started up. It’s not really a VM, but if you shell inside and look around, it seems like one. But now, with chroot at hand, you can see what ’not really a VM means: It’s just a process!
Namespaces mean when you start a container, you can’t see it in your process list, and cgroups mean that the process can have CPU and memory limits placed on it, but really, at a conceptual level, it’s just a process running with a different file-system root. Really containers are just a fancier way to chroot something!
Ok, let’s keep going.
ChRun Time
Another thing you may have noticed about containers is that they are ephemeral and relatively isolated. I can run N containers from one image and they will each be unique. Modern container runtimes use a union file-system ( like overlayfs ) for this but I get close to that with just temp directories.
Here’s my plan. When chrun pull <imagename>
is called, I grab a tar of the image and store it somewhere. Then each time chrun run <imagename>
is called, I’ll do the following:
- Create a temporary directory
- Extract
<imagename>.tar.gz
into it - Change root into that directory
- On exit, delete the directory
It’s looks like this:
func main() {
tar := fmt.Sprintf("./assets/%s.tar.gz", os.Args[2])
cmd := os.Args[3]
dir := createTempDir(tar)
defer os.RemoveAll(dir)
must(unTar(tar, dir))
chroot(dir, cmd)
}
First, I create a temp directory:
func createTempDir(name string) string {
var nonAlphanumericRegex = regexp.MustCompile(`[^a-zA-Z0-9 ]+`)
prefix := nonAlphanumericRegex.ReplaceAllString(name, "_")
dir, err := ioutil.TempDir("", prefix)
if err != nil {
log.Fatal(err)
}
return dir
}
Then I untar things:
func unTar(source string, dst string) error {
r, err := os.Open(source)
if err != nil {
return err
}
defer r.Close()
ctx := context.Background()
return extract.Archive(ctx, r, dst, nil)
}
And then chroot
, and we’re off:
func chroot(root string, call string) {
fmt.Printf("Running %s in %sn", call, root)
cmd := exec.Command(call)
must(syscall.Chroot(root))
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
must(cmd.Run())
}
( There are actually a couple more bits to it, but they are uninteresting. Entire file in this repo. )
And with that, I can do issues like begin up a redis consumer and server and have them discuss to one another:
> ./chrun pull redis
Pulling image redis
export image 16b87aa63c8f3a1e14a50feb94cba39eaa5d19bec64d90ff76c3ded058ad09c8
chrun
pulls an image from docker hub and builds a tar archive it. (docker export
does the heavy lifting)Running /usr/local/bin/redis-server in /tmp/_assets_redis_tar_gz4234401501
4360:C 31 Oct 2022 16:07:57.253 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
4360:C 31 Oct 2022 16:07:57.253 # Redis version=7.0.5, bits=64,
4360:C 31 Oct 2022 16:07:57.253 # Warning: no config file specified, using the
4360:M 31 Oct 2022 16:07:57.256 * Increased maximum number of open files to 10032 (it was originally set to 1024).
4360:M 31 Oct 2022 16:07:57.256 * monotonic clock: POSIX clock_gettime
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 7.0.5 (00000000/0) 64 bit
.-`` .-` `. ` `/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 4360
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | https://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
4360:M 31 Oct 2022 16:07:57.260 # Server initialized
4360:M 31 Oct 2022 16:07:57.265 * Ready to accept connections
chrun
run extracts the tar to a temp dir, changes root to it, and then starts the passed command. Afterward, it cleans up the temp dir.> chrun run redis "/usr/local/bin/redis-cli"
Running /usr/local/bin/redis-cli in /tmp/_assets_redis_tar_gz1366317376
127.0.0.1:6379> SET mykey "HellonWorld"
OK
127.0.0.1:6379> GET mykey
"HellonWorld"
127.0.0.1:6379>
127.0.0.1:6379> exit
And after I stop them, the temp dir is removed, and they disappear. So there you go, ‘containers’ using only chroot.
The source is on github.
Who Cares?
So who cares? I imply, many container runtimes exist already (runC, containerd, gVisor, StarStruck) and so they’re all higher than this one in virtually each means.
Properly, it might simply be me, however understanding {that a} container is similar to a course of that has been chrooted – so it’s operating towards the identical working system however with a unique root – that understanding helps floor my data of what containers are. It makes them appear much less magical and lets me take into consideration new prospects.
And so containers are nice. Namespaces, cgroups v2, runC, overlayfs, the OCI picture format, and all the pieces else on this area is spectacular engineering. It’s unimaginable ahead progress we are able to all reap the benefits of. However it’s not magic. It’s only a lengthy sequence of progressive refinements ( and a bit of selling ) on high of a function that has been in Unix since … let me test:
> git log usr/src/libc/sys/chroot.s | head -5
commit a0b0c390d5f37060bf64b63bba8e9f0a1dceb337
Author: Dennis Ritchie <dmr@research.uucp>
Date: Wed Jan 10 14:59:44 1979 -0500
Research V7 development
While you’re here:
Earthly is the easy CI/CD framework.
Develop CI/CD pipelines regionally and run them anyplace!