Making locally executing code think it’s running on k8s

“why mirrord hooks libc calls to enable faster debugging of cloud-native apps”

Debugging, like programming in general, often seems to be more of an art than a
science; figuring out exactly where your code breaks down is likely to involve
a significant amount of trial-and-error. You make a small change, recompile if
applicable, execute the program against some input known to be problematic and
observe its behaviour. Should you need additional visibility on what’s going on
inside, you use a debugger to step through each line of code and inspect the
state of variables.

This is all fine and dandy when working ‘locally‘, i.e. when you can carry out
the above process entirely on your workstation. You could be working on a
command line Rust application, in which case things might be as easy as running
`cargo build` and invoking the resulting executable with the right arguments. Or
you might be working on the front-end for a website in React.js, in which case a
simple `npm start` will enable ‘live coding‘, where any change you make to the
source will automatically be loaded in your browser. Whatever local development
you’re doing, odds are you have an IDE at your disposal with an integrated,
visual debugger, so operations like setting breakpoints and stepping through
your program are mere clicks away.

You’re not so lucky when your application is intended to run on kubernetes (or
any other container orchestration platform, for that matter). Implementing the
above pattern now unavoidably involves additional network operations; in most
cases, pushing and pulling newly built container images to and from a registry.
While it is certainly possible to automate this to the point where it requires
no more user interactions than pressing the ‘run‘ button in an IDE, it will
inexorably slow down the feedback loop.

Furthermore, once the new version of the code has been deployed, attaching a
debugger to it is no longer a trivial affair. In the case of a container running
on kubernetes, it will at least require some port forwarding and remote
debugging support. Again, this can all be abstracted away behind UI controls or
shell scripts (I have written such scripts myself), but setting up such
automations itself takes additional work. Not to mention the hassle of ensuring
everyone on the development team is on the same page and able to take advantage.
Developer experience may vary, to put it mildly.

But why bother deploying the code in the first place? Why should we need remote
debugging? Unless you’re cross-compiling for another architecture, it’s always
possible to run the program locally, thereby avoiding all of the above
time-consuming complications. Being able to run a program locally, however,
does not mean it will behave correctly. Given that we’re discussing kubernetes
applications, they are probably part of a distributed microservice architecture.
The application probably expects to be able to connect to other services, or
receive requests from upstream systems, all of which breaks down when any such
microservice isn’t running on a cluster, in the right environment.

Be that as it may, if you’re a well-adjusted developer, you’re probably already
writing tests. Many such tests execute locally by design, and to circumvent the
aforementioned issue, we invented mocking. Indeed, mocking is essential in any
kind of automated test suite, and such suites are a cornerstone of sound
software engineering. Through mocking, then, we can make the bit of code we
want to debug believe it is running in the ‘real‘ environment it expects, the
very environment in which running it would make it difficult to debug in the
first place. Problem solved!

Alas, were this the definitive answer, I could wrap up this post right here.
There are, of course, still a couple of issues. First of all, every experienced
developer knows that tests sometimes take longer to write than the code they are
meant to test (which often seems ample justification for omitting them).
Secondly, no test suite is ever exhaustive. It may not be possible or feasible
to predict what kind of inputs a service may end up receiving, so mocking can
only take you so far. In the end, your application will run in a cluster among
other services it interacts with, and only there will you be able to observe
all (potentially buggy) edge cases.

So what if we could make our code believe it is running in our cluster, without
actually mocking anything? In other words, what if we can run an application
locally, but still connect it to all resources it has to interact with inside
a cluster, without it ever knowing? This is where the mirrord project enters the
picture. I realise it has taken me long enough to bring up the subject mentioned
in the subtitle to this post, but I believe the picture I have been painting
thus far is necessary to fully appreciate how mirrord attacks this problem from
a unique angle, and why that angle matters.

In short, mirrord does what I alluded to above: it allows you to run an
application on your local workstation, while transparently making that
application ‘believe‘ it is actually running in the cluster environment it’s
designed for. It achieves this by intercepting all of the program’s I/O
operations and mirroring them to an actual containerized process running in a
kubernetes pod. Symmetrically, any I/O targeted at said containerized process is
also mirrored back to the local process. The pod, then, is configured exactly
as expected by the application in question.

The result is quite powerful: your local process can read or write any files
mounted into the pod, as expected. It can read environment variables injected
from config maps, as expected. It can perform service discovery through
cluster DNS, as expected. Other services can call its endpoints, it can respond
and even make network calls of its own, all as expected. There is no mocking, no
special application configuration to target the mocks, nothing. All while you
can simply attach a debugger to the process or read stdout in your terminal.

In his talk at KubeCon Europe 2024, MetalBear software engineer Tal Zwick
explained in (some) gory detail how mirrord is capable of this. More
specifically, he dove into the tricks his team had to pull to intercept all of
the I/O operations, of any program. Mirrord makes no distinction between
programming languages or frameworks, because it operates very closely to the one
level all of them have in common: machine code. I say closely, because it
actually works at the level of libc.

Let’s take a step back. All software, be it a Python script or an old ANSI C
program, must inevitably end up as a series of binary values loaded into the
instruction register of a CPU. The translation can happen directly, by compiling
and assembling, or indirectly, through an intermediate runtime or interpreter.
Assuming we’re talking about normal, user space programs—if not, I probably have
little new information to offer you in these few paragraphs—the machine code of
the program executes while the CPU is in ‘unprivileged‘ or protected mode. This
means the CPU will only accept instructions that read or write memory, or do
logic and arithmetic operations.

To do anything else, including any I/O, the program must ‘interrupt‘ the CPU
using a special instruction. The CPU will then go into privileged mode and be
able to perform I/O (among other things), but it will start executing the
operating system’s machine code. It is then up to the OS to figure out what the
user program wants, which is most often done by inspecting the state the user
program left certain registers in. This entire ordeal is known as a system call,
and it is the principal method for user space programs to request functionality
from the OS.

Today, very few programs use system calls directly. Such code is both
architecture-specific and OS-specific. Luckily, there has been a cross-platform
solution since the eighties: the C standard library (libc). Most general purpose
operating systems you’re likely to come across nowadays (Linux, Windows and
MacOS/BSD are the obvious ones) have been written in C, and their predecessors
have been for the past 40 years, to boot. Almost all provide an implementation
of this C library, a standardized collection of routines any user space program
can call to interact with the OS, instead of having to do system calls. The libc
function will then perform the appropriate system call This means
that a program which calls libc functions can be compiled and executed on any OS
which provides a suitable implementation. It is not hard to see why most
programs have been written this way ever since.

This is also true for most programs written in other languages; the compilers,
interpreters and standard libraries had to first be written in another language.
More often than not, this language was C, and for the reasons mentioned above,
they leveraged libc and obtained a cross-platform framework (although I’m
undoubtedly oversimplifying a bit here). For the sake of argument, though,
consider a Java program. When it is executed, the OS is really executing a JVM,
say HotSpot, which executes the program in turn. The Java program itself may not
be using libc, but the HotSpot software was largely written in C++, which does
use libc. A similar argument holds for programs written in Rust, Node.js,
Python, Kotlin, Ruby, … but not for Go on Linux, it turns out. (As Zwick
pointed out in his talk, the Go runtime for Linux does in fact implement system
calls directly.)

At the level of executables, this means that some machine code of the libc
implementation will almost always be loaded into process memory when running any
program. There, the application code can call those routines. The libc
functions’ instructions will still execute in unprivileged mode (i.e. user
space), like all other code in the program. It is here that mirrord can work
its magic. To see how, we must delve a bit deeper still.

When a program calls a libc function, it executes some kind of ‘jump‘
instruction, as I just mentioned. This instruction essentially tells the CPU
“don’t just load and execute the next instruction from memory, but instead
continue loading and executing instructions starting at this particular
address.” On old computer systems, these addresses were fixed, physical memory
locations determined during compilation. If you didn’t carefully load a program
at the right memory address, it would bug out as soon as a (non-relative) jump
instruction was executed. Nowadays, user space programs use virtual memory, and
jumps to functions in external libraries, like libc, may point to arbitrary,
dynamic addresses.

Upon execution, then, it is the job of the loader (ld.so on Linux) to figure
out what addresses have to be used for such dynamic jump instructions.
Executable files don’t contain those addresses: they use placeholders which
have to be resolved by the loader. The machine code these symbols point to could
be located outside the executable, in shared objects, or DLLs on Windows. This
is known as dynamic linking. The loader will ensure those bits of code are
loaded into memory (possibly sharing it with other processes through the magic
of virtual memory), and that the jumps in the main program are ‘patched‘ with
whatever virtual addresses the symbol-referenced functions end up occupying.

Almost all software is dynamically linked to libc like this. (The alternative is
static linking, where the used parts of libc are included in the executable.
This is mostly wasteful and impractical.) It is here, by manipulating the
addresses the loader obtains for dynamically linked libc I/O functions, that
mirrord is able to gain a foot in the door.

On Linux and MacOS/BSD, you can ask the loader to ‘preload‘ a shared library
before proceeding with the usual operations for executing a program. If you’re
sneaky, and preload something that exports some of the same symbols as libc,
this means you can effectively make the loader patch the jumps to libc of any
program with the addresses of your own functions! This is colloquially known as
the ‘LD_PRELOAD trick‘ on Linux, in this case being applied to the standard C
library.

Now we’re playing with power, as the well-known Nintendo commercial proclaimed.
By hooking libc like this, mirrord is able to arbitrarily alter the behaviour
of any I/O operation attempted by any compiled or interpreted program, written
in any language, using any framework. The only exceptions would be the
relatively rare cases of libraries and runtimes which directly use system calls;
Golang on Linux being one such exception, which had to be supported using a
modified runtime.

The final part of the puzzle is what mirrord does with the I/O operations it
intercepts. Conceptually, this is quite straightforward: it relays them to a pod
running on the kubernetes cluster, where the containerized process performs the
corresponding libc call. To the LD_PRELOADed application, all of its
interactions with the outside world, going through libc, produce results as if
the application is in fact running in said pod. Again, this works transparently
regardless of what convoluted intermediate paths the application code takes to
get down to libc.

I am, obviously, glossing over lots of complexity here. Mirrord provides an
operator to orchestrate the ‘mirror‘ pods on the cluster, most intercepted calls
require special treatment, it has to figure out a way to allow I/O operations to
propagate back from the pod to our local application… In addition, since the
end goal of all this is improved developer experience, the project has to
provide plugins for popular text editors and IDEs. If all of the magic couldn’t
be hidden behind a simple UI button, mirrord would have little to offer over the
alternatives we discussed earlier.

The fact that it does, by contrast, and in particular how this is achieved, is
deeply fascinating to me. It is a great example of a very high-level UX problem
with a very low-level solution. It prompted the software engineers at MetalBear
to cut vertically through all layers of our modern, cloud-native technology
stack, venturing straight down into libc, the final trench behind
which lurks the underworld that is kernel space. Only there, by tapping into a
handful of interfaces standardized back in the eighties, could they obtain a
solution that can make virtually any piece of software written for general
purpose hardware believe it is running somewhere it isn’t.