The microkernel approach
Most of today's operating-system kernels, for example the Linux kernel, are highly-complex pieces of software that contain everything needed to manage resources (e.g., memory), access the hardware, store information on a file system, handle network packets, and control user processes. Therefore, such a kernel requires the privilege to control the whole machine. The left side of the figure illustrates such a monolithic operating-system kernel.
The high functional requirements and the broad range of existing hardware causes such a kernel to grow huge. Users expect different network cards to be supported, different file systems, a wide variety of network protocols, and a lot more. Consequently, a typical Linux kernel contains far more than 500,000 lines of code. It is impossible to fully avoid bugs and security leaks at a system of such scale. Bug-prone code in the kernel can corrupt the proper operation of the whole system and thus, can impose fatal consequences. As illustrated in the right side of the figure, a broken network driver can corrupt the whole operating system.
With virtual address spaces, modern operating systems and hardware platforms provide mechanisms to isolate concurrently running user applications. Each user application runs within a dedicated address space and interacts in a safe, well-controlled manner with other user applications only via mechanisms provided by the kernel. This way, the kernel effectively protects user applications from each other.
Microkernel-based systems use these techniques not only for user applications but also for device drivers, file systems, and other typical kernel-level services. Therefore, the effect of a bug-prone component is locally restricted. The microkernel withdraws all non-needed privileges from each component and thereby shrinks the overall complexity of code running in privileged mode by an order of magnitude compared to a monolithic kernel. For example, a typical microkernel of the L4 family is implemented in less than 20,000 lines of code. As illustrated in the figure on the right, all components are protected from each other by address spaces. Thus, one component cannot inspect or corrupt other components without proper authorization. Communication between components can only happen by using communication mechanisms provided by the microkernel. If one component of the system gets corrupted by a bug or an attack, the fault is locally restricted to the broken component. Furthermore, the microkernel enforces CPU time scheduling and can grant guaranteed processing time to user processes. No de-privileged system component is able to violate such guarantees. Therefore, a microkernel can safely execute sensitive applications, de-privileged system services, and large untrusted applications side by side on one machine.
Whereas microkernels deliver fault isolation and separation of concerns well in theory, in practice, this is only half of a solution because all those de-privileged components must be appropriately organized to make the approach effective. Because, by definition, a microkernel does not implement policy but only mechanisms, the policy must be provided by someone else. One approach is the introduction of a central policy management component with a global view on the whole system, controlled by a specially-privileged administrator. The complexity and manageability of a centralized policy, however, depends on the scale of the system.
In contrast, Genode extends the microkernel idea of de-composing the operating system code to the idea of de-composing also the system policy by imposing a strict organizational structure onto each part of the system.
For understanding the architecture of Genode, it is helpful to draw an analogy to the internal operation of a company. The key of how a company successfully scales with a growing number of employees is its organizational structure. The same applies for the scalability of an operating system with regard to the richness of its functionality. To keep a high number of employees manageable, successful companies implement policies in a distributed way. Rather than having one single person dictating everybody's tasks directly, policy decisions are taken and applied hierarchically. The heart of the Genode OS architecture is an organizational structure that loosely resembles the structure of a hierarchically organized company.
As illustrated in the figure on the right, Genode organizes processes as a tree. The red arrows symbolize that child processes are created out of the resources of their respective parents. When creating a child process, a parent fully defines the virtual environment in which the new process gets executed. The child, in turn, can further create children from its assigned resources, thereby creating an arbitrarily structured subsystem. Each parent maintains full control over the subsystems it created and defines their inter-relationship, for example by selectively permitting communication between them or by assigning physical resources. The parent-child interface is the same at each hierarchy level, which makes this organizational approach recursively applicable. Using this recursive property as a tool clears the way for separating policy and mechanism in very flexible ways and facilitates the strict separation of duties.
For an application of Genode's organizational structure to a real-world scenario refer to Stefan Kalkowski's diploma thesis.
Principle of least privilege naturally applied
When created, a new process is only allowed to communicate with its immediate parent. It is otherwise completely isolated from the rest of the system. The parent can selectively assign communication rights to enable a child to communicate with the outside of the child's subsystem.
Maximizing isolation by avoiding global names
To maximize the isolation between unrelated subsystems, Genode does not rely on globally visible names or IDs. Following the idea that launching a directed attack is difficult if the target is invisible, Genode operates without the need for any visible global information such as file names, process IDs, thread IDs, device nodes, or port numbers. Genode is explicitly designed for kernels with support for kernel-protected localized names (capabilities).
Mastering complexity through application-specific trusted computing bases
Because software complexity correlates with the likelihood for bugs, having security-sensitive functionality depending on high-complexity software is risky. The term trusted_computing_base (TCB) was coined to describe the amount of code that must not be compromised to uphold security. In addition to the code of the sensitive application, the TCB comprises each system component that has direct or indirect control over the execution of the application (affecting availability and integrity) or that can access the processed information (affecting confidentiality and integrity). On monolithic OSes, the TCB complexity can be regarded as a global system property because it is dominated by the complexity of the kernel and the privileged processes, which are essentially the same for each concurrently executed application. On Genode, the amount of security-critical code can largely differ for each application depending on the position of the application within Genode's process tree and the used services. To illustrate the difference, an email-signing application executed on Linux has to rely on a TCB complexity of millions of lines of code (LOC). Most of the code, however, does not provide functionality required to perform the actual cryptographic function of the signing application. Still, the credentials of the user are exposed to an overly complex TCB including the network stack, device drivers, and file systems. In contrast, Genode allows the cryptographic function to be executed with a specific TCB that consists only of components that are needed to perform the signing function. For the signing application, the TCB would contain the microkernel (20 KLOC), the Genode OS framework (10 KLOC), a minimally-complex GUI (2 KLOC), and the signing application (15 KLOC). These components stack up to a complexity of less than 50,000 LOC.
Genode tailors the trusted computing base for each application individually. The figure on the right illustrates the TCB of the yellow marked process. Naturally, it contains the hierarchy of parents and those processes that provide services used by the application (the left component at the third level).
Trading and tracking of physical resources
The majority of current operating systems try to hide the fact that physical resources such as memory, network bandwidth, or graphics bandwidth are limited. For example, on such systems, processes expect to be able to allocate memory from an unlimited pool whereas the operating system puts a swapping mechanism in place to uphold the illusion of having unlimited memory. This approach, however, sacrifices deterministic system behaviour. Furthermore, resources used by the kernel on behalf of applications, for example the resources needed by device drivers, are not accounted to the end-using applications at all. To resort to the analogy of operating a company, giving departments the illusion of having unlimited resources and refusing to track resource usage would ultimately lead to unbalanced finances.
In contrast, Genode facilitates the explicit accounting of physical resources to processes. Similar to an upper-level manager who trades company resources among his managed departments, each parent in Genode's process tree trades resources among its children. Genode enables the lending and regaining of physical resources between clients and servers according to a chain of command. For example, for the service of providing a GUI window, a GUI server can demand the client to lend a negotiated amount of memory needed for managing the window during the window's lifetime. Genode provides this resource-trading mechanism across arbitrary hierarchy levels. Thereby, physical resources get correctly accounted at all times. Even though Genode allows for dynamic workload and arbitrarily structured process trees, the system remains deterministic.
Virtualization-enabled application compatibility
Because operating systems without applications are barely useful, compatibility to existing applications is a major concern. In the past, the concern of losing compatibility often prevented design legacies to be disposed of. Modern virtualization technology is the key to overcome this problem. In an preliminary study, a user-level version of the Linux kernel (L4Linux) was successfully ported to the Genode OS Framework running on a L4 kernel. This study suggests that Genode combined with an appropriate kernel or hypervisor is suited as a virtual-machine hosting platform. As illustrated in the figure, from an organizational point of view, a virtual machine is implemented as a leaf node in Genode's process tree. Genode not only facilitates the use of virtual machines for application compatibility but also the re-use of existing device drivers. The baby Tux in the figure symbolizes an original Linux device driver being executed in a device-driver environment.