Component composition
Genode provides a playground for combining components in many different ways. The best composition of components often depends on the goal of the system integrator. Among possible goals are the ease of use for the end user, the cost-efficient reuse of existing software, and good application performance. However, the most prominent goal is the mitigation of security risks. This section presents composition techniques that leverage Genode's architecture to dramatically reduce the trusted computing base of applications and to solve rather complicated problems in surprisingly easy ways.
The figures presented throughout this section use a simpler nomenclature than the previous sections. A component is depicted as box. Parent-child relationships are represented as light-gray arrows. A session between a client and a server is illustrated by a dashed arrow pointing to the server.
Sandboxing
The functionality of existing applications and libraries is often worth reusing or economically downright infeasible to reimplement. Examples are PDF rendering engines, libraries that support commonly used video and audio codecs, or libraries that decode hundreds of image formats.
However, code of such rich functionality is inherently complex and must be assumed to contain security flaws. This is empirically evidenced by the never ending stream of security exploits targeting the decoders of data formats. But even in the absence of bugs, the processing of data by third-party libraries may have unintended side effects. For example, a PDF file may contain code that accesses the file system, which the user of a PDF reader may not expect. By linking such a third-party library to a security-critical application, the application's security is seemingly traded against the functional value that the library offers.
Fortunately, Genode's architecture principally allows every component to encapsulate untrusted functionality in child components. So instead of directly linking a third-party library to an application, the application executes the library code in a dedicated sub component. By imposing a strict session-routing policy onto the component, the untrusted code is restricted to its sandbox. Figure 2 shows a video player as a practical example of this approach.
The video player uses the nitpicker GUI server to present a user interface with the graphical controls of the player. Furthermore, it has access to a media file containing video and audio data. Instead of linking the media-codec library (libav) directly to the video-player application, it executes the codec as a child component. Thereby the application effectively restricts the execution environment of the codec to only those resources that are needed by the codec. Those resources are the media file that is handed out to the codec as a ROM module, a facility to output video frames in the form of a GUI session, and a facility to output an audio stream in the form of an audio-out session.
In order to reuse as much code as possible, the video player executes an existing example application called avplay that comes with the codec library as child component. The avplay example uses libSDL as back end for video and audio output and responds to a few keyboard shortcuts for controlling the video playback such as pausing the video. Because there exists a Genode version of libSDL, avplay can be executed as a Genode component with no modifications. This version of libSDL requests a GUI session (Section GUI) and an audio-out session (Section Audio output) to perform the video and audio output and respond to user input. Furthermore, it opens a ROM session for obtaining a configuration. This configuration parametrizes the audio back end of libSDL. Because avplay is a child of the video-player application, all those session requests are directed to the application. It is entirely up to the application how to respond to those requests. For accommodating the request for a GUI session, the application creates a second GUI session, configures a virtual framebuffer, and embeds this virtual framebuffer into its GUI. It keeps the GUI session capability for itself and merely hands out the virtual GUI's session capability to avplay. For accommodating the request for the input stream, it hands out a capability to a locally-implemented input stream. Using this input stream, it becomes able to supply artificial input events to avplay. For example, when the user clicks on the play button of the application's GUI, the application would submit a sequence of press and release events to the input stream, which appear to avplay as the keyboard shortcut for starting the playback. To let the user adjust the audio parameters of libSDL during playback, the video-player application dynamically changes the avplay configuration using the mechanism described in Section Dynamic component reconfiguration at runtime. As a response to a configuration update, libSDL's audio back end picks up the changed configuration parameters and adjusts the audio playback accordingly.
By sandboxing avplay as a child component of the video player, a bug in the video or audio codecs can no longer compromise the application. The execution environment of avplay is tailored to the needs of the codec. In particular, it does not allow the codec to access any files or the network. In the worst case, if avplay becomes corrupted, the possible damage is restricted to producing wrong video or audio frames but a corrupted codec can neither access any of the user's data nor can it communicate to the outside world.
Component-level and OS-level virtualization
The sandboxing technique presented in the previous section tailors the execution environment of untrusted third-party code by applying an application-specific policy to all session requests originating from the untrusted code. However, the tailoring of the execution environment by the parent can even go a step further by providing the all-encompassing virtualization of all services used by the child, including core's services such as PD, CPU, and LOG. This way, the parent can not just tailor the execution environment of a child but completely define all aspects of the child's execution. This clears the way for introducing custom operating-system interfaces at any position within the component tree, or for monitoring the behavior of subsystems.
Introducing a custom OS interface
By intercepting all session interfaces normally provided by core, a runtime environment becomes able to handle all low-level interactions of the child with core. This includes the allocation of memory using the PD service, the spawning and controlling of threads using the CPU service, and the management of the child's address space using the PD service.
This flexibility paves the ground for hosting traditional operating-system interfaces such as Unix as a mere user-level construct within a Genode system. Normally, several aspects of Unix would contradict with Genode's architecture:
-
The Unix system-call interface supports files and sockets as first-level citizens.
-
There is no global virtual file system in Genode.
-
Any Unix process can allocate memory as needed. There is no necessity for explicit assignment of memory resources to Unix processes. In contrast, Genode employs the rigid accounting of physical resources as explained in Section Resource trading.
-
Processes are created by forking existing processes. The new process inherits the roles (in the form of open file descriptors) of the forking process.
Figure 3 illustrates a custom Unix runtime environment that bridges these gaps by using Genode's building blocks.
-
The VFS server is able mount TAR archives locally as a virtual file system and offers the content as a file-system service. Furthermore, the VFS server exposes a terminal session as a pseudo file. In the depicted scenario, the terminal session request is routed to the parent of init.
-
The fs_rom component provides a ROM service by fetching the content of ROM modules from a file system. By connecting the fs_rom with the VFS component, the files of the bash.tar and vim.tar archives become available as ROM modules. With the bash executable binary accessible as ROM module, it can be executed as a Genode component.
-
The init component allows one to stick components together and let the result appear to the surrounding system as a single component. It is used to host the composition of the VFS, fs_rom, and bash.
-
The bash shell can spawn child processes such as Vim by relying of traditional Unix interfaces, namely fork and execve. In contrast to regular Unix systems, however, the underlying mechanisms are implemented as part of the C runtime with no kernel support or special privileges needed. In the depicted scenario, bash plays the role of a runtime environment for Vim. Since bash is a parent of Vim, it is able to respond to Vim's resource demands by paying out of its own pocket, thereby softening Genode's rigid resource accounting to accommodate Vim's expectations.
Monitoring the behavior of subsystems
Besides hosting arbitrary OS personalities as a subsystem, the interception of core's services allows for the all-encompassing monitoring of subsystems without the need for special support in the kernel. This is useful for failsafe monitoring or for user-level debugging.
As described in Section Component creation, any Genode component is created out of low-level resources in the form of sessions provided by core. Those sessions include at least a PD session, a CPU session, and a ROM session with the executable binary as depicted in Figure 4. In addition to those low-level sessions, the component may interact with sessions provided by other components.
For debugging a component, a debugger would need a way to inspect the internal state of the component. As the complete internal state is usually known by the OS kernel only, the traditional approach to user-level debugging is the introduction of a debugging interface into the kernel. For example, Linux has the ptrace mechanism and several microkernels of the L4 family come with built-in kernel debuggers. Such a debugging interface, however, introduces security risks. Besides increasing the complexity of the kernel, access to the kernel's debugging mechanisms needs to be strictly subjected to a security policy. Otherwise any program could use those mechanisms to inspect or manipulate other programs. Most L4 kernels usually exclude debugging features in production builds altogether.
In a Genode system, the component's internal state is represented in the form of core sessions. Hence, by intercepting those sessions of a child, a parent can monitor all interactions of the child with core and thereby record the child's internal state. Figure 5 shows a scenario where a debug monitor executes a component (debugging target) as a child while intercepting all sessions to core's services. The interception is performed by providing custom implementations of core's session interfaces as locally implemented services. Under the hood, the local services realize their functionality using actual core sessions. But by sitting in the middle between the debugging target and core, the debug monitor can observe the target's internal state including the memory content, the virtual address-space layout, and the state of all threads running inside the component. Furthermore, since the debug monitor is in possession of all the session capabilities of the debugging target, it can manipulate it in arbitrary ways. For example, it can change thread states (e.g., pausing the execution or enable single-stepping) and modify the memory content (e.g., inserting breakpoint instructions). The figure shows that those debugging features can be remotely controlled over a terminal connection.
Using this form of component-level virtualization, a problem that used to require special kernel additions in traditional operating systems can be solved via Genode's regular interfaces.
Interposing individual services
The design of Genode's fundamental services, in particular resource multiplexers, is guided by the principle of minimalism. Because such components are security critical, complexity must be avoided. Functionality is added to such components only if it cannot be provided outside the component.
However, components like the nitpicker GUI server are often confronted with feature requests. For example, users may want to move a window on screen by dragging the window's title bar. Because nitpicker has no notion of windows or title bars, such functionality is not supported. Instead, nitpicker moves the burden to implement window decorations to its clients. However, this approach sacrifices functionality that is taken for granted on modern graphical user interfaces. For example, the user may want to switch the application focus using a keyboard shortcut or perform window operations and the interactions with virtual desktops in a consistent way. If each application implemented the functionality of virtual desktops individually, the result would hardly be usable. For this reason, it is tempting to move window-management functionality into the GUI server and to accept the violation of the minimalism principle.
The nitpicker GUI server is not the only service challenged by feature requests. The problem is present even at the lowest-level services provided by core. Core's region-map mechanism is used to manage the virtual address spaces of components via their respective PD sessions. When a dataspace is attached to a region map, the region map picks a suitable virtual address range where the dataspace will be made visible in the virtual address space. The allocation strategy depends on several factors such as alignment constraints and the address range that fits best. But eventually, it is deterministic. This contradicts the common wisdom that address spaces shall be randomized. Hence core's PD service is challenged with the request for adding address-space randomization as a feature. Unfortunately, the addition of such a feature into core raises two issues. First, core would need to have a source of good random numbers. But core does not contain any device drivers where to draw entropy from. With weak entropy, the randomization might be not random enough. In this case, the pretension of a security mechanism that is actually ineffective may be worse than not having it in the first place. Second, the feature would certainly increase the complexity of core. This is acceptable for components that potentially benefit from the added feature, such as outward-facing network applications. But the complexity eventually becomes part of the TCB of all components including those that do not benefit from the feature.
The solution to those kind of problems is the enrichment of existing servers by interposing their sessions. Figure 6 shows a window manager implemented as a separate component outside of nitpicker. Both the nitpicker GUI server and the window manager provide the nitpicker session interface. But the window manager enriches the semantics of the interface by adding window decorations and a window-layout policy. Under the hood, the window manager uses the real nitpicker GUI server to implement its service. From the application's point of view, the use of either service is transparent. Security-critical applications can still be routed directly to the nitpicker GUI server. So the complexity of the window manager comes into effect only for those applications that use it.
The same approach can be applied to the address-space randomization problem. A component with access to good random numbers may provide a randomized version of core's PD service. Outward-facing components can benefit from this security feature by having their PD session requests routed to this component instead of core.
Ceding the parenthood
When using a shell to manage subsystems, the complexity of the shell naturally becomes a security risk. A shell can be a text-command interpreter, a graphical desktop shell, a web browser that launches subsystems as plugins, or a web server that provides a remote administration interface. What all those kinds of shells have in common is that they contain an enormous amount of complexity that can be attributed to convenience. For example, a textual shell usually depends on libreadline, ncurses, or similar libraries to provide a command history and to deal with the peculiarities of virtual text terminals. A graphical desktop shell is even worse because it usually depends on a highly complex widget toolkit, not to mention using a web browser as a shell. Unfortunately, the functionality provided by these programs cannot be dismissed as it is expected by the user. But the high complexity of the convenience functions fundamentally contradicts the security-critical role of the shell as the common parent of all spawned subsystems. If the shell gets compromised, all the spawned subsystems will suffer.
The risk of such convoluted shells can be mitigated by moving the parent role for the started subsystems to another component, namely a loader service. In contrast to the shell, which should be regarded as untrusted due it its complexity, the loader is a small component that is orders of magnitude less complex. Figure 7 shows a scenario where a web browser is used as a shell to spawn a Genode subsystem. Instead of spawning the subsystem as the child of the browser, the browser creates a loader session. Using the loader-session interface described in Section Loader, it can initially import the to-be-executed subsystem into the loader session and kick off the execution of the subsystem. However, once the subsystem is running, the browser can no longer interfere with the subsystem's operation. So security-sensitive information processed within the loaded subsystem are no longer exposed to the browser. Still, the lifetime of the loaded subsystem depends on the browser. If it decides to close the loader session, the loader will destroy the corresponding subsystem.
By ceding the parenthood to a trusted component, the risks stemming from the complexity of various kinds of shells can be mitigated.
Publishing and subscribing
All the mechanisms for transferring data between components presented in Section Inter-component communication have in common that data is transferred in a peer-to-peer fashion. A client transfers data to a server or vice versa. However, there are situations where such a close coupling of both ends of communication is not desired. In multicast scenarios, the producer of information desires to propagate information without the need to interact (or even depend on a handshake) with each individual recipient. Specifically, a component might want to publish status information about itself that might be useful for other components. For example, a wireless-networking driver may report the list of detected wireless networks along with their respective SSIDs and reception qualities such that a GUI component can pick up the information and present it to the user. Each time, the driver detects a change in the ether, it wants to publish an updated version of the list. Such a scenario could principally be addressed by introducing a use-case-specific session interface, i.e., a "wlan-list" session. But this approach has two disadvantages.
-
It forces the wireless driver to play an additional server role. Instead of pushing information anytime at the discretion of the driver, the driver has to actively support the pulling of information from the wlan-list client. This is arguably more complex.
-
The wlan-list session interface ultimately depends on the capabilities of the driver implementation. If an alternative wireless driver is able to supplement the list with further details, the wlan-list session interface of the alternative driver might look different. As a consequence, the approach is likely to introduce many special-purpose session interfaces. This contradicts with the goal to promote the composability of components as stated at the beginning of Section Common session interfaces.
As an alternative to introducing special-purpose session interfaces for addressing the scenarios outlined above, two existing session interfaces can be combined, namely ROM and report.
Report-ROM server
The report-rom server is both a ROM service and a report service. It acts as an information broker between information providers (clients of the report service) and information consumers (clients of the ROM service).
To propagate its internal state to the outside, a component creates a report session. From the client's perspective, the posting of information via the report session's submit function is a fire-and-forget operation, similar to the submission of a signal. But in contrast to a signal, which cannot carry any payload, a report is accompanied with arbitrary data. For the example above, the wireless driver would create a report session. Each time, the list of networks changes, it would submit an updated list as a report to the report-ROM server.
The report-ROM server stores incoming reports in a database using the client's session label as key. Therefore, the wireless driver's report will end up in the database under the name of the driver component. If one component wishes to post reports of different kinds, it can do so by extending the session label by a component-provided label suffix supplied as session-construction argument (Section Report). The memory needed as the backing store for the report at the report-ROM server is accounted to the report client via the session-quota mechanism described in Section Trading memory between clients and servers.
In its role of a ROM service, the report-ROM server hands out the reports stored in its database as ROM modules. The association of reports with ROM sessions is based on the session label of the ROM client. The configuration of the report-ROM server contains a list of policies as introduced in Section Server-side policy selection. Each policy entry is accompanied with a corresponding key into the report database.
When a new report comes in, all ROM clients that are associated with the report are informed via a ROM-update signal (Section Read-only memory (ROM)). Each client can individually respond to the signal by following the ROM-module update procedure and thereby obtain the new version of the report. From the client's perspective, the origin of the information is opaque. It cannot decide whether the ROM module is provided by the report-ROM server or an arbitrary other ROM service.
Coming back to the wireless-driver example, the use of the report-ROM server effectively decouples the GUI application from the wireless driver. This has the following benefits:
-
The application can be developed and tested with an arbitrary ROM server supplying an artificially created list of networks.
-
There is no need for the introduction of a special-purpose session interface between both components.
-
The wireless driver can post state updates in an intuitive fire-and-forget way without playing an additional server role.
-
The wireless driver can be restarted without affecting the application.
Poly-instantiation of the report-ROM mechanism
The report-ROM server is a canonical example of a protocol stack (Section Protocol stacks). It performs a translation between the report-session interface and the ROM-session interface. Being a protocol stack, it can be instantiated any number of times. It is up to the system integrator whether to use one instance for gathering the reports of many report clients, or to instantiate multiple report-ROM servers. Taken to the extreme, one report-ROM server could be instantiated per report client. The routing of ROM-session requests restricts the access of the ROM clients to the different instances. Even in the event that the report-ROM server is compromised, the policy for the information flows between the producers and consumers of information stays in effect.
Feedback control system
By combining the techniques presented in Sections Ceding the parenthood and Publishing and subscribing, a general pattern of a feedback-control system emerges (Figure 8).
This pattern achieves a strict separation of policy from functionality by employing a dynamically configured init component (dynamic init) in tandem with a management component (manager). The manager (1) monitors the state of the dynamic init and its children, and (2) feeds the dynamic init with configurations. Both the dynamic init and the manager are siblings within another (e.g., the static initial) init instance. The state report and the config ROM are propagated via the report-ROM component as presented in Section Publishing and subscribing. Hence, the manager and the dynamic init are loosely coupled. There is no client-server dependency in either direction.
The init component supports the reporting of its current state including the state of all children. Refer to Section State reporting for an overview of the reporting options. The report captures, among other things, the resource consumption of each child. For example, should a child overstep its resource boundaries, the report's respective <child> node turns into this:
<state> .. <child name="system-shell" ...> <ram ... requested="2M"/> .. </child> </state>
If the requested attribute is present, the child got stuck in a resource request. In the example above, the "system-shell" asks for 2M of additional memory. To resolve this situation, the manager can generate a new configuration for the dynamic init. In particular, it can
-
Adjust the resource quota of the resource-starved child. When the dynamic init observes such a configuration change, it answers the resource request and thereby prompts the child to continue its execution.
-
Restart the child by incrementing a version attribute of the child node. Once the dynamic init observes a change of this attribute, the child is killed and restarted.
In addition to responding to resource requests, the manager component can also evaluate other parts of the report. The two most interesting bits of information are the exit state of each child (featuring the exit code) and the health. The health is the child's ability to respond to external events. It is described in more detail at Section Component health monitoring. It effectively allows the manager to implement watchdog functionality.
In contrast to the potentially highly complex child hosted within the dynamic init, the manager component is much less prone to bugs. All it does is consuming reports (parsing some XML) and generating configurations (generating XML). It does not not need any C runtime, file system, or I/O drivers. It can be implemented without any dynamic memory allocations. In contrast to the complex child, which can just expected to be flaky, the manager and the dynamic init are supposed to be correct and trustworthy. Since the functionality of the dynamic init is present in the regular init component, which is part of any Genode system's trusted computing base anyway, the feedback-control pattern does not add new critical code complexity.
Examples
-
The sequential execution of multiple automated tests integrated in single system scenario requires the monitoring and parsing of test results and the orchestration of the test sequence. The depot-autopilot scenario at gems/run/depot_autopilot.run solves these problems by applying the described pattern.
-
The combination of a dynamic init with a manager can be found in advanced test scenarios such as the init test (os/recipes/pkg/test-init).
-
To enable one system image to run on a variety of hardware configurations, the dynamic probing of devices and starting of appropriate device drivers is needed. The device-driver subsystem of Sculpt OS solves this problem with the manager component gems/src/app/driver_manager/. For example, the driver manager evaluates the information reported by the PCI-bus driver to conditionally start the most suitable graphics driver.
-
For the on-target installation of packages, the so-called depot-download subsystem uses a dynamic init that successively downloads, verifies, extracts, and parses archives. The manager of this subsystem is located at gems/src/app/depot_download_manager/.
-
The arguably most sophisticated example is the sculpt manager that controls the runtime environment of the Sculpt operating system. It is located at gems/src/app/sculpt_manager/.